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Abstract 


Data  dissemination  systems  are  difficult  to  design  properly.  Since  interest  in  this  topic  is  relatively 
new,  there  is  little  experience  with  methods  for  making  performance  decisions.  This  project  has  been 
focused  on  providing  tools  for  understanding  the  impact  of  design  decisions  in  dissemmation-based  data 

delivery.  « 

We  have  developed  two  broad  classes  of  tools.  The  first  is  a  set  of  simulation-based  tools  that  allowed 

us  to  study  fundamental  algorithms.  We  have  applied  these  simulators  to  the  problem  of  information 
channelization.  We  have  built  a  test  harness  for  controlling  the  deployment  of  multiple  experiments  and 
the  collection  of  results  into  a  convenient  graphical  form.  We  see  this  a  major  step  toward  a  commander  s 
workbench,  a  tool  to  help  commanders  make  resource  allocation  decisions. 

The  second  class  of  tool  is  a  toolkit  that  allows  us  to  quickly  assemble  prototype  (perhaps  m  a  local- 
area  network)  that  mimics  the  ultimate  deployed  system.  A  prototype  of  this  kind  can  be  instrumented 
and  performance  measurments  can  be  collected  that  gives  us  more  realistic  insight  than  a  simulator.  This 
approach  provides  the  second  line  of  defense. 
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Technical  Report 


1  Dissemination- Based  Information  Systems 

The  proliferation  of  the  Internet  and  intranets,  the  development  of  wireless  and  satellite  networks,  and  the 
availability  of  asymmetric,  high-bandwidth  links  to  the  home,  have  fueled  the  development  of  a  wide  range  of 
new  ‘‘dissemination-based”  applications.  These  applications  involve  the  timely  distribution  of  data  to  a  large 
set  of  consumers,  and  include  stock  and  sports  tickers,  traffic  information  systems,  electronic  personalized 
newspapers,  and  entertainment  delivery.  Dissemination-oriented  applications  have  special  characteristics 
that  render  traditional  client-server  data  management  approaches  ineffective.  These  include. 


•  tremendous  scale. 

•  a  high-degree  of  overlap  in  user  data  needs. 

•  asymmetric  data  flow  from  sources  to  consumers. 


To  address  the  particular  needs  of  dissemination- based  applications,  we  have  developed  a  general  frame¬ 
work  for  describing  and  constructing  Dissemination-Based  Information  Systems  (DBIS)  [Fran97,  Fran98J. 
The  framework  incorporates  a  number  of  data  delivery  mechanisms  and  an  architecture  for  deploying  them 
in  a  networked  environment.  The  goal  is  to  support  a  wide  range  of  applications  across  many  varied  en¬ 
vironments,  such  as  mobile  networks,  satellite-based  systems,  and  wide-area  networks.  By  combining  the 
various  data  delivery  techniques  in  a  way  that  matches  the  characteristics  of  the  application  and  achieves 
the  most  efficient  use  of  the  available  server  and  communication  resources,  the  scalability  and  performance 
of  dissemination-oriented  applications  can  be  greatly  enhanced.  We  have  constructed  an  initial  version  of  a 
toolkit  that  implements  this  framework,  and  have  demonstrated  it  at  the  1999  ACM  SIGMOD  International 
Conference  on  the  Management  of  Data  [Alti99]. 


1.1  The  DBIS  Framework 

The  basic  concepts  of  the  DBIS  framework  were  presented  at  the  OOPSLA  97  conference  [Fran97],  A 
more  recent  description  appears  in  [Akso98b].  The  two  major  features  of  the  framework  are:  First,  it 
incorporates  a  number  of  different  options  for  data  delivery,  including  traditional  request-response,  pub¬ 
lish/subscribe,  Broadcast  Disks  and  on-demand  broadcast  [Akso98a].  Second,  it  is  based  on  the  notion  of 
network  transparency,  which  allows  different  data  delivery  mechanisms  to  be  mixed-and-matched  within  a 
single  application.  Network  transparency  is  provided  through  the  use  of  Information  Brokers,  which  acquire 
information  and  distribute  it  to  other  consumers.  Brokers  are  middlemen;  a  broker  acts  as  a  client  to  some 
number  of  data  sources,  collects  and  possibly  repackages  the  data  it  obtains,  and  then  functions  as  a  data 
source  to  other  nodes  of  the  system.  Along  the  way,  brokers  may  add  value  to  the  information,  such  as 
integrating  it  with  data  from  other  sources  or  enhancing  its  organizational  structure.  By  creating  hierarchies 
of  brokers,  information  delivery  can  be  tailored  to  the  needs  of  many  different  users. 


1.2  The  DBIS  Toolkit  and  Example  Application 


We  have  developed  an  initial  prototype  of  a  toolkit  that  implements  the  DBIS  architecture.  The  toolkit  is 
written  in  Visual  C++  and  runs  on  Windows  NT.  It  exploits  the  IP-Multicast  support  that  is  included  with 
NT  4  0  The  toolkit  consists  of  30,000  lines  of  code,  although  a  portion  of  this  consists  of  code  generated  by 
the  Visual  C++  tools  for  user  interface  functions.  The  toolkit  is  described  in  more  detail  in  [Alti99J. 


The  toolkit  provides  a  set  of  application  programming  interfaces  (APIs)  and  libraries  that  allow  a  de¬ 
veloper  to  construct  and  experiment  with  a  DBIS  application.  The  DBIS-Toolkit  consists  of  four  mam 

components: 
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Data  Source  (DS)  Library  -  a  data  source  wrapper  that  encapsulates  network  communication  and 
provides  conversion  functions  for  data. 

Client  Library  -  a  client  program  wrapper  that  encapsulates  network  communication  and  provides 
conversion  functions  for  queries  and  user  profiles.  It  also  provides  monitoring  and  filtering  of  broadcast  or 
multicast  channels. 

Information  Broker  (IB)  -  the  main  component  of  the  DBIS-Toolkit.  The  IB  contains  communication, 
buffering,  scheduling,  and  catalog  management  components  and  is  described  in  more  detail  below. 

Information  Broker  Master  -  The  IB  Master  is  responsible  for  managing  global  catalog  information 
about  data  and  the  topology  of  the  DBIS.  All  IBs  must  register  with  the  IB  Master  and  all  catalog  updates 
must  be  sent  to  the  IB  Master. 

In  addition  to  these  four  components,  the  toolkit  contains  a  flexible  performance  monitoring  and  instru¬ 
mentation  interface  that  can  be  used  to  graphically  display  real-time  performance  metrics  such  as  bandwidth 
and  CPU  utilization,  response  times,  etc.  on  a  per-IB  basis.  The  instrumentation  tool  also  allows  more 
application-specific  metrics  to  be  obtained  and  displayed. 

We  have  used  the  toolkit  to  construct  a  CONUS  weather  map  dissemination  application,  in  which  weather 
maps  for  various  regions  of  the  Continental  United  States  can  be  delivered  to  large  numbers  of  users  via 
push/pull,  multicast/unicast  and  their  various  combinations.  Perhaps  most  impressively,  we  demonstrated 
how  the  data  delivery  mechanisms  between  various  components  can  be  changed  on-the-fly ,  without  requiring 
the  application  to  be  restarted,  and  without  impacting  components  that  were  not  directly  connected  to  the 
changed  data  flow. 


2  On-Demand  Broadcasting 

As  described  above,  one  of  the  many  possible  mechanisms  for  data  dissemination  uses  on-demand  (be., 
aperiodic  pull)  broadcast  of  data.  In  a  typical  scenario,  two  independent  networks  are  used:  a  terrestrial 
network  for  sending  pull  requests  to  the  server,  and  a  “listen  only”  satellite  downlink  over  which  the  server 
broadcasts  data  to  all  of  the  clients.  When  a  client  needs  a  data  item  (e.g.,  a  web  page,  database  object, 
map,  etc.)  that  it  cannot  find  locally,  it  sends  a  request  for  the  item  to  the  server.  Client  requests  are 
queued  up  (if  necessary)  at  the  server  upon  arrival.  The  server  repeatedly  chooses  an  item  from  among  these 
requests,  broadcasts  it  over  the  satellite  link,  and  removes  the  associated  request(s)  from  the  queue.  Clients 
monitor  the  broadcast  and  receive  the  item(s)  that  they  require. 


2.1  Scheduling  with  RxW 

In  a  large-scale  implementation  of  such  a  system,  an  important  consideration  is  the  scheduling  algorithm  that 
the  server  uses  to  choose  which  request  to  service  from  its  queue  of  waiting  requests.  We  have  developed  a 
novel  on-demand  broadcast  scheduling  algorithm,  called  RxW  [Akso98a],  which  is  a  practical,  low-overhead 
and  scalable  approach  that  provides  excellent  performance  across  a  range  of  scenarios. 

The  intuition  behind  the  RxW  scheduling  algorithm  is  to  provide  balanced  performance  for  hot  (pop¬ 
ular)  and  cold  (not  so  popular)  pages.  This  intuition  is  based  on  our  observations  of  previously  proposed 
algorithms,  which  failed  because  they  favored  one  class  of  items  over  the  other,  or  because  they  were  too 
expensive  to  be  used  in  a  real  system.  The  RxW  algorithm  schedules  the  page  with  the  maximal  RxW 
value  where  R  is  the  number  of  outstanding  requests  for  that  page  and  W  is  the  amount  time  that  the 
oldest  of  those  requests  has  been  waiting  for  the  page.  Thus,  RxW  schedules  a  page  either  because  has  many 
outstanding  requests  or  because  there  is  at  least  one  request  that  has  waited  for  a  long  time. 

The  search  algorithm  is  made  efficient  by  using  two  sorted  lists  of  requests  (one  ordered  by  R  values  and 
the  other  ordered  by  W  values)  threaded  through  the  service  queue.  The  algorithm  prunes  the  search  space 
by  alternating  between  the  two  lists,  each  time,  bounding  the  search  on  the  other  list.  When  the  limit  on 
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one  of  the  lists  is  reached,  the  page  with  the  maximal  RxW  has  been  found  and  can  be  broadcast.  In  our 
experiments,  this  pruning  technique  was  found  to  reduce  the  size  of  the  search  space  by  over  70 %. 


We  also  developed  an  approximation-based  version  of  the  algorithm  to  provide  even  further  reductions 
in  scheduling  time  with  only  minimal  impact  on  scheduling  quality.  By  varying  a  single  parameter  this 
algorithm  can  be  tuned  from  having  the  same  behavior  as  the  RxW  algorithm  described  previously,  to  being 

a  constant  time  approach. 


We  implemented  RxW  and  its  approximate  version  on  our  dissemination  toolkit.  Experiments  with  the 
toolkit  verified  the  effectiveness  and  efficiency  of  RxW  and  showed  that  for  our  particular  configuration,  the 
approximate  version  was  able  to  significantly  outperform  the  full  version,  providing  fast  scheduling  while 
still  producing  a  high-quality  broadcast  schedule. 


2.2  Data  Staging 

While  the  RxW  algorithm  is  a  practical  approach  to  on-demand  broadcast  scheduling,  it  like  all  previous  work 
on  broadcast  scheduling,  does  not  account  for  the  need  to  obtain  the  items  before  they  can  be  broadcast. 
In  many  large-scale  applications  data  may  not  be  available  immediately  when  required  by  the  scheduler. 
There  are  many  applications  that  involve  large  amounts  of  data  that  cannot  be  cost-effectively  Stored  in 
main  memory.  Furthermore,  in  a  wide-area  distributed  system  such  as  the  WWW  the  data  to  be  broadcast 
is  likely  to  reside  at  a  remote  site.  In  either  case,  data  items  must  be  retrieved  and  brought  into  the  server  s 
main  memory  before  they  can  be  broadcast.  The  need  to  fetch  data  from  various  locations  produces  large 
variance  in  service  times,  which  can  destroy  the  performance  of  traditional  scheduling  heuristics.  Thu  , 
communications-centric  approach  that  ignores  data  management  issues  can  result  m  significant  ^gradation 
of  broadcast  efficiency.  For  this  reason,  we  have  investigated  the  coordination  of  broadcast  scheduling 
the  management  of  the  data  items  to  be  broadcast.  We  refer  to  this  integrated  functionality  as  data  staging. 

We  have  developed  three  complementary  approaches  to  data  staging  All  three  approaches  exploit  the 
information  on  page  popularity  that  is  maintained  by  RxW  and  have  been  integrated  with  RxW  m  our 
dissemination  testbed.  The  three  data  staging  approaches  are  the  following: 


•  Opportunistic  Scheduling:  In  a  large-scale  broadcast  system,  broadcast  bandwidth  is  the  key  shared 
resource,  and  thus,  it  is  crucial  utilize  it  to  the  fullest  extent.  It  is  a  well-known  property  of  broadcast 
scheduling  that  the  optimal  allocation  of  bandwidth  to  items  is  proportional  to  the  ratio  of  the  square 
roots  of  their  access  probability.  The  practical  implication  of  this  is  that  the  broadcast  effectiveness 
is  not  greatly  effected  by  small  deviations  from  its  optimal  allocation.  We  exploit  this  property  by 
sometimes  broadcasting  sub-optimal,  but  memory  resident  data  items,  while  the  scheduled  items  are 
being  brought  into  the  server’s  cache.  We  have  developed  three  alternative  approaches  for  choosing 
these  alternative  items  to  be  broadcast. 

•  Caching:  An  obvious  way  to  reduce  the  need  for  fetching  data  items  is  to  make  the  best  use  of  the 
available  memory  space  on  the  server.  The  key  to  successful  caching  is  to  retain  those  items  that  are 
most  likely  to  be  scheduled.  The  RxW  algorithm  is  able  to  provide  very  good  hints  for  differentiating 
between  hot  and  cold  items.  We  exploit  this  property  to  make  intelligent  caching  decisions. 

.  Prefetching:  Another  method  to  reduce  access  latency  is  to  predict  which  items  will  be  broadest  in 
the  near  future  and  bring  them  into  the  cache  before  they  are  actually  scheduled  for  broadcast.  Since 
it  is  the  responsibility  of  the  caching  policy  to  keep  hot  items  available,  prefetching  focuses  only  on 
cold  items,  which  are  not  likely  to  be  in  the  cache.  The  RxW  algorithm  can  help  identify  cold  items 
that  are  likely  to  be  broadcast  in  the  near  future. 

Our  performance  experiments  using  both  synthetic  workloads  and  WWW  server  traces  have  shown  that 
data  staging  concerns  are  indeed  crucial,  and  that  these  approaches  are  effective  (to  varying  extents)  in 
providing  substantial  performance  improvements  for  on-demand  broadcast. 
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3  Self-Adaptive  User  Profiles  for  Large  Scale  Data  Delivery 

User  profiles,  which  encode  the  data  needs  and  interests  of  users,  are  lie  at  the  heart  of  any  dissemination- 
based  information  systems.  <,From  the  user’s  viewpoint,  a  profile  provides  a  means  of  passively  retrieving 
relevant  information.  A  user  can  submit  a  profile  to  a  push-based  system  once,  and  then  continuously 
receive  data  that  are  relevant  to  him  or  her  in  a  timely  fashion  without  the  need  for  submitting  the  same 
query  over  and  over  again.  This  automatic  flow  of  relevant  information  helps  the  user  keep  pace  with  the 
ever-increasing  rate  of  information  generation.  From  the  system  point  of  view,  profiles  fulfill  a  role  similar 
to  that  of  queries  in  database  or  information  retrieval  systems.  In  fact,  profiles  are  a  form  of  continuously 
executing  query.  In  a  large  data  dissemination  system,  the  storage  and  access  of  user  profiles  can  be  be 
resource-intensive.  Additionally,  given  the  fact  that  user  interests  are  changing  over  time,  the  profiles  must 
be  updated  accordingly  to  reflect  up  to  date  information  needs. 

We  have  developed  an  algorithm  called  Multi-Modal  (MM),  for  incrementally  constructing  and  maintain¬ 
ing  user  profiles  for  filtering  text-based  data  items  [Ceti99].  MM  can  be  tuned  to  tradeoff  effectiveness  (i.e., 
accuracy  of  the  filtered  data  items),  and  efficiency  of  profile  management.  The  algorithm  receives  relevance 
feedback  information  from  the  users  about  the  documents  that  they  have  seen  (i.e.,  a  binary  indication  of 
whether  or  not  the  document  was  considered  useful),  and  uses  this  information  to  improve  the  current  profile. 
One  important  aspect  of  MM  is  that  it  represents  a  user  profile  as  multiple  keyword  vectors  whose  size  and 
elements  change  dynamically  based  on  user  feedback. 

In  fact,  it  is  this  multi-modal  representation  of  profiles  which  allows  MM  to  tradeoff  effectiveness  and 
efficiency.  More  specifically,  the  algorithm  can  be  tuned  using  a  threshold  parameter  to  produce  profiles 
with  different  sizes.  Let  us  consider  the  two  boundary  values  of  this  threshold  parameter  to  illustrate  this 
tradeoff:  When  the  threshold  is  set  to  0,  a  user  profile  is  represented  by  a  single  keyword  vector,  achieving 
an  extremely  low  overhead  for  profile  management,  but  seriously  limiting  the  effectiveness  of  the  profile.  At 
the  other  extreme,  if  the  threshold  is  set  to  1,  we  achieve  an  extremely  fine  granularity  user  model,  however 
the  profile  size  equals  the  number  of  relevant  documents  observed  by  the  user,  making  it  impractical  to  store 
and  maintain  profiles.  Therefore,  it  is  more  desirable  to  consider  intermediate  threshold  values  which  will 
provide  an  optimal  effectiveness/efficiency  tradeoff  for  a  given  application. 

We  evaluated  the  utility  of  MM  by  experimentally  investigating  its  ability  to  categorize  pages  from  the 
WWW.  In  particular,  we  tested  its  ability  to  learn  (human-generated)  categories  provided  by  the  Yahoo! 
index.  Our  focus  was  on  the  tradeoffs  between  profile  sizes  and  effectiveness  (using  non-interpolated  average 
precision  as  our  primary  effectiveness  metric).  The  evaluation  demonstrated  that  MM  can  achieve  signif¬ 
icantly  higher  precision  values  with  only  a  modest  increase  in  profile  sizes.  Additionally,  MM  was  able  to 
achieve  precision  values  with  small  profiles  that  were  comparable  to,  or  in  some  cases  even  better  than  those 
obtained  with  maximum-sized  profiles.  The  details  of  the  algorithm,  experimental  setting,  and  the  results 
are  discussed  in  [Ceti99]. 


4  Information  Quality  Metrics 

A  subproject  of  this  work  has  investigated  the  connections  between  data  dissemination  policy  and  task 
utility.  While  there  are  various  quality  metrics  that  can  be  gathered  directly  on  an  information  stream, 
those  metrics  might  not  translate  directly  into  the  ’’quality”  with  which  an  information-dependent  task  is 

performed. 

One  of  the  aims  of  the  subproject  was  to  gain  insight  on  how  the  intuitive  notion  of  the  ” importance” 
of  an  information  stream  can  be  translated  into  a  lower-level  system  control  policy.  Some  proposals  that  we 
saw  put  forward  in  the  BADD  implementation  effort,  such  as  interpreting  importance  as  straight  priority, 
seemed  to  have  unpleasant  consequences,  such  as  task  starvation. 

To  get  better  qualitative  and  quantitative  handle  on  task  utility,  we  concetrated  on  a  particular  category 
of  task  namely  those  that  depend  strongly  on  a  situation  estimate.  By  "situation  estimate”,  we  mean  a 
computer-based  model  of  some  part  of  the  physical  world.  Examples  of  situation  estimates  might  include 
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terrain  elevation  and  cover,  major  lines  of  communication  for  vehicles,  and  location  and  patterns  of  emission 
of  radio  broadcast  sites.  By  ”stong  dependence”  of  a  task  on  a  situation  estimate,  we  mean  a  task  whose 
performance  directly  correlates  with  the  accuracy  and  currency  of  the  estimate.  For  e*amPle’  a  tT  Jl 

when  red-force  vehicles  cross  a  certain  boundary  could  strongly  depend  on  an  estimate  of  vehicle  tracks. 

One  result  of  our  analysis  of  utility  of  information  streams  for  such  tasks  is  the  recognition  that  utility 
cannot  simplv  be  assigned  to  individual  information  items  in  a  stream.  There  are  several  reasons  that  ut  lity 
can’t  be  aligned  to  items  in  isolation.  One  is  that  the  effect  of  losing  one  item  can  depend  on  whether 
other  items  in  the  information  stream  were  received  at  the  task  site.  Losing  one  report  on  a  ve^le  position 
might  have  minor  effect  if  reports  three  seconds  earlier  and  three  seconds  later  were  received.  On  the  other 
hand,  loss  of  one  information  item  might  render  others  useless,  or  of  much  diminished  value  For  example, 
if  two  listening  stations  are  reporting  the  time,  bearing  and  frequency  of  radio  broadcasts  losing  a  repor 
from  one  may  make  the  corresponding  report  from  the  other  useless  for  purposes  of  locating  an  emitter  by 

triangulation. 

In  addition,  the  relative  contribution  of  a  given  information  item  might  depend  on  the  latency  with  which 
it  is  received.  A  position  report  on  a  vehicle  will  lose  value  as  it  is  delayed,  especially  after  a  subseque 
report  is  recieved  on  the  same  vehicle. 

Thus  schemes  for  managing  information  streams  that  are  based  on  assigning  static  values  to  individual 
Information  items  are  likelyto  result  in  sut^optimal  dissemination  policies.  Rather,  such  policies  need  to  be 
evaluated  by  their  effect  on  an  information  stream  as  a  whole  as  it  influences  task  performance. 

To  gain  a  better  understanding  of  these  issues,  we  have  begun  a  simulation  effort  to  evaluate  different 
channel-scheduling  and  item-dropping  schemes  relative  to  task  performance.  As  ar iimtial  ^ 

sumption,  we  have  approximated  task  performance  by  situation  estimate  accuracy.  We  essentially  measure 
the  integral  of  estimate  error  over  time  (so  higher  scores  mean  poorer  task  performance). 

Our  initial  simulations  are  looking  at  different  channel-scheduling  policies  relative  to  this  measure  We 
have  pure-priority,  round-robin  and  weighted-deficit  round-robin  schedulers  running.  Initial  simulations 
reveal  that  pure-priority  scheduling  benefits  the  task  using  the  highest-pnority  information  stream,  to  th 
detriment  of  all  tasks  at  any  lower  priority.  Round-robin  ensures  that  all  tasks  get  a  share  of  |  ^  c  P  £ 
for  their  information  streams,  but  fails  to  reflect  relative  importance  of  task.  Weighted-deficit  round-robin  is 
a  so-called  ”  proportional  share”  scheduling  algorithm,  and  allows  more  important  streams  to  receive  a  larger 
share  of  resources,  while  not  starving  or  arbitrarily  delaying  other  streams. 

Currently,  different  streams  are  served  on  a  first-come,  first-served  basis  and  queues  of  information  items 
can  grow  without  bound.  Planned  extensions  will  incorporate  dropping  policies  and  reordering  of 

5  Channelization 

In  a  publish-subscribe  system,  a  profile  describes  a  client’s  interest  in  a  set  of  data  items,  and  these  profiles 
are  used  by  a  server  (i.e.,  a  data  source)  to  send  data  to  appropriate  clients. 

Often,  the  information  that  is  sent  to  clients  is  broken  up  into  a  number  of  channels  A  channel  is  a 
tranmission  medium  with  fixed  bandwidth  over  which  data  can  be  sent.  Typically,  clients  can  lis  e 
constrained  set  of  channels.  These  channels  can  be  physical  manifestations  requiring  specialized  tuners 
the  client  end  (e.g.,  satellite  systems),  or  they  can  be  virtual  by  mutiplexmg  multiple  channels  on  a  sing  e 
physical1  channel.  ^In  the  case  i  virtual  channels,  they  provide  a  way  of  narrowing  the  focus  of  what  a  client 

must  filter. 


5.1  Simulator  Structure 

We  have  constructed  two  simulation  models  to  study  a  satellite-based  publish-subscribe  system.  Each 
Z  Sowing  fundamental  characteristics.  A  data  source  produces  updates  to  data  based  some  update 
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distribution  with  exponential  interarrival  rates.  Clients  are  each  connected  to  some  number  of  satellite 
channels,  and  a  delivery  mechanism  uses  client  profiles  to  send  data  source  updates  on  channels  so  as  to 
satisfy  all  client  interests.  The  delivery  mechanism  contains  three  pieces.  The  first  is  a  Profiler  which  matches 
updates  with  profiles  to  determine  whether  there  exists  a  client  interested  in  the  update.  Next,  is  the  Mapper 
that  determines  the  channels  on  which  to  send  updates  based  on  the  channels  on  which  interested  clients  are 
connected.  Last,  is  the  the  Scheduler  which  determines  the  order  in  which  to  send  updates. 

The  first  model  looks  at  two  problems.  First,  how  are  clients  assigned  to  channels,  and  second,  how  are 
updates  mapped  to  channels. 

Client-to-channel  assignment  is  a  challenging  problem  because  both  client  interests  and  data  update  rates 
have  to  be  considered  while  attempting  to  make  efficient  use  of  bandwith.  This  goal  is  achieved  by  trying 
to  group  clients  with  similar  profiles.  The  success  of  an  algorithm  is  measured  by  the  clients  getting  less  of 
what  they  don’t  want,  and  the  Mapper  making  fewer  copies  of  updates. 

Page-to-channel  mapping  is  also  an  interesting  problem.  Based  on  a  client-to-channel  assignment,  the 
Mapper  determines  on  which  channels  to  map  updates.  The  mapping  can  be  either  static  or  dynamic.  With 
a  static  mapping,  the  channels  that  a  data  item  is  mapped  to  is  fixed.  With  dynamic  mapping,  the  channels 
that  a  data  item  is  mapped  to  is  determined  when  the  update  is  received  by  the  Mapper.  The  goal  of  both 
types  of  algorithms  is  to  minimize  the  number  of  channels  an  update  is  sent  on,  and  also,  to  reduce  the  load 
accross  all  channels. 

The  second  model  reverses  the  approach  of  the  first  model.  Data  items  are  first  mapped  to  channels,  and 
based  on  this  mapping  clients  select  channels  on  which  to  connect. 

Here,  the  mapper  effectively  accomplishes  both  the  task  of  client-to-channel  assignment  and  page-to- 
channel  mapping  with  one  algorithm.  The  mapper  assigns  data  items  to  channels  based  on  knowledge  of 
profiles  and  data  item  update  rates.  It  attempts  to  cluster  profiles  so  as  to  minimize  the  load  accross  channels 
while  also  attempting  to  reduce  the  number  of  channels  on  which  each  client  has  to  listen. 

Client-to-channel  assignment  can  be  done  in  two  ways:  the  data  delivery  mechanism  can  tell  each  client 
on  which  channels  it  should  listen,  or  with  a  guide  of  page  to  channel  mappings,  each  client  can  select 
appropriate  channels. 

5.2  Simulator  Measurements 

We  use  average  staleness  as  our  metric  of  how  well  we  are  doing.  Average  staleness  is  the  difference  between 
the  time  at  which  the  client  receives  an  update  minus  the  time  at  which  the  update  actually  occured 
averaged  over  all  clients  that  received  that  update.  This  represents  an  average  delay  for  items  to  work  their 
way  through  the  system.  An  important  realization  is  that  once  the  flow  into  the  system  exceeds  a  given 
rate  such  that  one  or  more  of  the  queues  becomes  unstable,  our  average  staleness  measurements  become 
meaningless.  Thus,  we  use  a  differential  flow  rate  (DFR)  to  determine  when  the  system  becomes  unstable. 
It  is  the  point  at  which  the  DFR  becomes  non-zero  that  is  of  interest.  In  other  words,  an  algorithm  that 
delays  this  point  is  doing  better  than  one  that  does  not. 

In  the  first  simulation  model,  we  have  studied  various  client-to-channel  assignment  algorithms.  We  have 
shown  that  the  choice  of  an  assignment  mechanism  here  is  related  to  the  choice  of  a  mapper  algorithm  at  the 
server.  We  have  also  studied  several  server  mapping  algorithms.  One  study  shows  that  if  D  is  a  measure  of 
channel  load  and  C  is  a  measure  of  the  number  of  copies  that  are  made,  an  algorithm  based  on  a  combined 
metric  such  as  DC  outperforms  one  that  considers  only  D  or  one  that  considers  only  C. 

Both  of  these  simulators  are  working  quite  well  and  provide  a  good  testbed  for  further  study  of  chan¬ 
nelization  algorithms.  This  includes  algorithms  for  profile  matching,  channel  assignment,  channel  mapping, 
and  channel  scheduling. 
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Abstract —  Recent  advances  in  telecommunications  have  en¬ 
abled  the  deployment  of  broadcast-based  wide-area  informa¬ 
tion  services  that  provide  on-demand  data  access  to  very  large 
client  populations.  In  order  to  effectively  utilize  a  broadcast 
medium  for  such  a  service,  it  is  necessary  to  have  efficient, 
on-line  scheduling  algorithms  that  can  balance  individual  and 
overall  performance,  and  can  scale  in  terms  of  data  set  sizes, 
client  populations,  and  broadcast  bandwidth.  In  this  study 
we  introduce  a  parameterized  algorithm  that  provides  good 
performance  across  all  of  these  criteria  and  can  be  tuned  to 
emphasize  either  average  or  worst  case  waiting  time.  Unlike 
previous  work  on  low  overhead  scheduling,  the  algorithm  is 
not  based  on  estimates  of  the  access  probabilities  of  items, 
but  rather,  it  makes  scheduling  decisions  based  on  the  cur¬ 
rent  queue  state,  allowing  it  to  easily  adapt  to  changes  in  the 
intensity  and  distribution  of  the  workload.  We  examine  the 
performance  of  the  algorithm  using  a  simulation  model. 

I.  Introduction 
.4.  On-demand  Data  Broadcast 

Broadcast-based  information  systems  are  becoming  in¬ 
creasingly  popular  due  to  advances  in  telecommunications, 
interconnectivity  and  mobile  computing.  Compared  to  tra¬ 
ditional  unicast  data  transfer,  broadcasting  can  be  much 
more  efficient  for  disseminating  information  to  large  num¬ 
bers  of  clients  in  applications  where  there  is  a  high  degree 
of  commonality  among  client  interests.  With  unicast,  a  data 
item  must  be  transmitted  at  least  once  for  every  client  who 
requests  it,  resulting  in  scalability  problems  as  the  client 
population  increases.  The  World  Wide  Web  has  provided 
numerous  examples  of  such  situations,  such  as  election  re¬ 
sult  servers,  sporting  event  kiosks,  stock  market  tickers,  etc. 
The  access  delays  associated  with  such  sites  on  the  WWW’ 
during  periods  of  heavy  use  demonstrate  the  inefficiencies  of 
unicast  delivery  for  dissemination-oriented  applications. 

The  advantage  of  broadcast  for  data  dissemination  is  that 
each  transmission  of  an  item  can  satisfy  the  needs  of  po¬ 
tentially  many  clients.  Several  forms  of  data  broadcasting 
have  already  been  introduced  commercially.  Intel  has  been 
broadcasting  data  along  with  normal  TV  signals  [Intel94] . 
Hughes  Network  Systems  is  using  satellites  for  delivering  In¬ 
ternet  content  [DirecPC]  and  plans  to  incorporate  broadcast 
technology.  Broadcasting  using  cable  technology  is  being 
developed  by  Hybrid  Networks  Inc.  [Hybrid]  and  others. 
There  has  also  been  tremendous  improvement  in  the  band¬ 
width  that  is  available  for  data  broadcast.  For  example, 
the  Teledesic  system  is  expected  to  provide  bandwidth  of 
155.52Mbps  up  to  1.244Gbps  [Teledesic]. 

This  work  has  been  partially  supported  by  the  NSF  under  grant  IR1- 
9501353,  by  Rome  Labs  agreement  number  F306C2-97- 2-0241  under 
DARPA  order  number  F078,  and  by -research  grants  from  Intel  and 
NEC. 


B.  Our  Focus 

While  broadcast  technology  continues  to  advance  in  terms 
of  both  ubiquity  and  bandwidth,  improvements  in  intercon¬ 
nectivity  are  fueling  explosive  growth  in  the  amount  of  data 
available  on-line  and  the  number  of  clients  who  wish  to  access 
that  data.  In  this  paper,  we  focus  on  scheduling  algorithms 
for  dissemination- oriented  applications  in  which  a  large  and 
possibly  changing  client  population  requests  data  items  from 
an  information  source  equipped  with  a  data  broadcasting  ca¬ 
pability.  The  challenge  in  developing  scheduling  algorithms 
for  such  on-demand  data  broadcast  is  to  provide  scalable  per¬ 
formance  that  balances  average  and  individual  (i.e.,  worst 
case)  responsiveness  using  the  shared  broadcast  medium. 
Such  algorithms  must  cope  with  large  databases,  large  client 
populations  with  dynamically  changing  interests  and  compo¬ 
sition,  and  with  high  broadcast  bandwidth. 

As  indicated  by  the  preceding  discussion,  a  scheduling 
approach  for  large-scale,  on-demand  data  broadcast  must 
balance  different  requirements.  The  algorithms  that  have 
been  developed  to  date  (e.g.,  [Dyke86],  [Wong88],  [Vaidya96], 
[Su97])  have  failed  to  meet  one  or  more  of  these  needs. 
Some  approaches  have  used  simple  scheduling  policies  such 
as  FCFS  (First  Come  First  Served),  which  provide  average 
case  performance  that  is  significantly  lower  than  what  could 
be  supported  by  the  broadcast  medium.  More  sophisticated 
approaches  aimed  at  providing  better  performance  have  been 
based  on  assumptions  that  limit  their  applicability,  such  as 
assuming  very  small  database  sizes,  static  data  access  prob¬ 
abilities  (thereby  limiting  the  ability  to  adapt  to  changing 
client  needs),  and/or  ignoring  the  overheads  associated  with 
making  intelligent  scheduling  decisions. 

A  key  element  that  has  been  missing  from  the  previous 
work  is  a  comprehensive  set  of  metrics  for  on-demand  data 
broadcast  in  large-scale  data  dissemination  environments.  In 
this  paper,  therefore,  we  first  outline  the  performance  crite¬ 
ria  that  must  be  addressed  by  such  scheduling  algorithms. 
These  criteria  include:  average  and  worst  case  performance, 
scheduling  overhead,  and  robustness  in  the  presence  of  cer¬ 
tain  environmental  changes.  We  show  how  existing  algo¬ 
rithms  fail  to  meet  one  or  more  of  these  criteria. 

We  then  define  a  parameterized  algorithm,  called  RxW, 
that  performs  well  for  all  of  these  metrics  and  furthermore, 
can  be  tuned  to  focus  on  scheduling  overhead,  average  wait¬ 
ing  time,  or  worst  case  wait  time  according  to  the  needs  of 
a  particular  application.  RxW  is  robust  to  changes  in  the 
client  population  and  workload  because  it  makes  scheduling 
decisions  based  only  on  the  current  queue  state,  rather  than 
depending  on  estimates  of  data  item  access  probabilities. 
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Tin1  remainder  of  l  lie  paper  is  structured  as  follows.  In  Sec¬ 
tion  II  wp  give  a  brief  description  of  the  problem  and  define 
the  important  criteria  for  evaluating  scheduling  algorithms 
for  large-scale  broadcast.  We  then  describe  how  previously 
proposed  algorithms  measure  up  to  these  criteria.  In  Sec¬ 
tion  III  we  develop  several  variants  of  a  new  algorithm,  called 
RiW ,  which  has  low  overhead  and  provides  good  average 
and  worst  case  performance.  Section  IV  presents  an  evalua¬ 
tion  of  the  algorithm  in  terms  of  its  performance,  scalability, 
and  robustness  to  workload  changes.  Section  V  discusses 
related  work.  Finally.  Section  VI  presents  our  conclusions. 

II.  Background 

A.  Environment 

In  this  section  we  present  a  simple  satellite-based  broad¬ 
cast  scenario  to  motivate  the  scheduling  problem  we  are 
addressing.  In  this  scenario  (depicted  in  Figure  1)  clients 
use  two  independent  networks  for  communicating  with  the 
server:  a  terrestrial  network  for  sending  requests  to  the 
server,  and  a  “listen  only"  satellite  downlink  to  receive  data 
from  the  server,  similar  to  Hughes  Network  System’s  Di¬ 
recPC  architecture  [DirecPC]  and  other  satellite  data  ser¬ 
vices.  When  a  client  needs  a  data  item  (e.g. ,  a  web  page  or 
database  object)  that  it  cannot  find  locally,  it  sends  a  request 
for  the  item  to  the  server.  Client  requests  are  queued  up  (if 
necessary)  at  the  server  upon  arrival.  The  server  repeatedly 
chooses  an  item  from  among  these  requests,  broadcasts  it 
over  the  satellite  link,  and  removes  the  associated  request(s) 
from  the  queue.  Clients  monitor  the  broadcast  and  receive 
the  item(s)  that  they  are  waiting  for. 


&  &  b- jUj 


Fig.  1.  Example  Data  Broadcasting  Scenario 

The  focus  of  this  paper  is  on  the  scheduling  algorithm  used 
by  the  server  to  choose  the  item  to  broadcast  among  those 
that  have  been  requested.  Because  a  single  broadcast  of  an 
item  satisfies  all  of  the  outstanding  requests  for  the  item,  a 
good  scheduling  algorithm  has  the  potential  to  gTeatly  im¬ 
prove  the  effectiveness  of  the  broadcast. 

Similar  to  the  previous  work  on  broadcast  scheduling 
we  make  the  following  assumptions  about  the  environment. 
First,  we  assume  that  data  items  are  fixed-length  (e.g., 


database  pages)  so  that  the  broadcast  bandwidth  can  be  di¬ 
vided  into  equal  length,  item-sized  “slots”.  In  the  remainder 
of  the  paper,  we  refer  to  such  fixed-sized  items  as  "pages" 
and  we  refer  to  the  length  (in  time)  of  a  broadcast  slot  as  a 
broadcast  tick  and  use  such  ticks  as  the  unit  of  time  measure.1 
Second,  we  assume  that  clients  continuously  monitor  the 
broadcast  after  they  make  a  request  and  we  do  not  consider 
the  effects  of  transmission  errors,  so  that  all  clients  that  are 
waiting  for  an  item  receive  that  item  when  it  is  broadcast  by 
the  server.  Finally,  we  ignore  the  delay  for  sending  requests 
via  the  client-to-server  uplink,  which  we  expect  to  be  small 
compared  to  the  latency  of  obtaining  broadcast  items  from 
a  moderately  or  heavily  loaded  server. 

B.  Performance  Issues 

Given  the  application  environment  described  so  far,  we  can 
now  state  our  criteria  for  evaluating  broadcast  scheduling 
algorithms  for  large-scale  data  dissemination. 

B.l  Responsiveness 

The  success  of  a  scheduling  algorithm  is  determined  by  its 
ability  to  get  requested  data  to  the  clients  quickly.  In  this 
regard,  the  first  important  metric,  average  waiting  time  is 
the  amount  of  time  on  average,  from  the  instant  that  a  client 
request  arrives  at  the  server,  to  the  time  that  the  requested 
item  is  broadcast.  Second,  worst  case  waiting  time  is  the 
maximum  amount  of  time  that  any  client  request  will  have 
to  wait  in  the  service  queue  to  be  satisfied. 

B.2  Scheduling  Overhead 

Because  of  the  requirement  for  scalability,  a  key  aspect  of 
this  study  is  the  consideration  of  scheduling  overhead  at  the 
server.  Overhead  is  examined  in  two  categories: 

1.  Request  Processing  -  When  a  new  request  arrives  at  the 
server,  the  server  must  quickly  decide  whether  or  not 
to  place  an  entry  in  the  request  queue  for  the  requested 
item  and/or  update  and  possibly  restructure  the  queue 
contents.  The  speed  of  such  processing  limits  the  rate  at 
which  requests  can  be  processed  by  the  server,  effectively 
placing  a  limit  on  the  scalability  in  terms  of  request 
arrival  rate  (e.g.,  number  of  clients  supported). 

2.  Scheduling  Decisions  -  On  every  broadcast  tick  the 
server  must  choose  a  page  to  broadcast.  If  the  de¬ 
cision  overhead  is  excessive  the  server  may  not  be  able 
to  support  the  increased  broadcast  bandwidth  or  larger 
database  sizes. 

The  two  types  of  scheduling  overhead  are  related:  for  ex¬ 
ample,  doing  extra  work  when  requests  arrive  (e.g.,  keeping 
sorted  lists  of  page  access  probabilities)  can  reduce  the  cost 
of  making  scheduling  decisions  at  the  expense  of  an  increased 
cost  for  processing  requests.  The  proper  trade-off  between 
these  costs  is  dictated  by  the  types  of  scalability  that  are 
important  for  a  particular  environment. 

‘The  fixed  length  assumption  simplifies  the  algorithm  descrip¬ 
tions  and  analysis.  Recent  work  in  broadcast  scheduling  has  shown 
how  to  extend  scheduling  algorithms  to  incorporate  variable-length 
items  [Vaidya96],  [Su97]. 
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B. 3  Robustness 

In  order  to  achieve  t  lie  goals  of  responsiveness  and  scalabil¬ 
ity.  a  scheduling  algorithm  will  typically  employ  approxima¬ 
tions  and/or  heuristics.  Such  heuristics  must  not  be  based  on 
static  information  that  will  cause  the  algorithm  to  perform 
poorly  if  the  workload  or  the  environment  changes. 

C.  Previous  Algorithms 

As  stated  in  the  Introduction,  several  algorithms  for  on- 
demand  broadcast  scheduling  have  been  proposed  previously. 
In  this  section,  we  describe  existing  algorithms  and  discuss 
their  limitations  with  respect  to  the  criteria  that  were  out¬ 
lined  in  the  preceding  section.  Dykeman  et  al.  [Dyke86]  stud¬ 
ied  on-line  scheduling  algorithms,  and  were  the  first  to  point 
out  that  traditional  FCFS  scheduling  would  provide  poor 
average  wait  time  for  a  broadcast  environment  when  the  ac¬ 
cess  distribution  for  data  items  was  non-uniform.  They  pro¬ 
posed  several  algorithms  aimed  at  providing  improved  per¬ 
formance.  The  algorithms  studied  in  [Dyke86]  (and  later 
described  in  [Wong88])  are  the  following: 

.  First  Come  First  Served  (FCFS):  broadcasts  the 
pages  in  the  order  they  are  requested. 

.  Most  Requests  First  (MRF):  broadcasts  the  page 
with  the  maximum  number  of  pending  requests. 

.  Most  Requests  First  Lowest  (MRFL):  is  essentially 
same  as  MRF.  but  breaks  ties  in  favor  of  the  page  with 
the  lowest  access  probability. 

.  Longest  Wait  First  (LWF):  selects  the  page  that  has 
the  largest  total  waiting  time,  i.e.,  the  sum  of  the  time 
that  all  pending  requests  for  t he  item  have  been  waiting. 

Figure  2  plots  the  average  waiting  time  (in  broadcast  ticks) 
for  a  workload  with  a  database  of  10000  pages.  Client  re¬ 
quests  for  pages  are  generated  using  a  Zipf  distribution  of 
maximum  skewness  (parameter  5=1).  The  results  were 
generated  using  the  simulation  environment  and  default  pa¬ 
rameters  that  we  will  describe  in  Section  IV.  As  in  [Dyke86], 
the  overheads  associated  with  running  the  scheduling  algo¬ 
rithm  at  the  server  are  not  modeled  here. 


Fig.  2.  Average  Wait  Time  for  Algorithms  of  Dykeman  et  al. 


As  can  be  seen  in  the  figure,  the  best  performance  over¬ 
all  in  this  case  is  provided  by  LWF.  As  would  be  expected, 


the  average  wait  time  increases  for  all  algorithms  as  more 
requests  are  introduced.  Perhaps  less  predictably,  however, 
the  average  response  time  eventually  levels  off  and  becomes 
insensitive  to  additional  load.  At  this  point,  the  remaining 
algorithms  are  approximately  2.5  times  slower  than  LWF. 
Unfortunately,  LWF  is  not.  a  practical  algorithm  for  a  large 
system,  as  at  each  broadcast  tick,  it  recalculates  total  accu¬ 
mulated  wait  time  for  every  page  with  pending  requests  in  or¬ 
der  to  decide  which  page  to  broadcast.  For  a  high-bandwidth 
system  with  a  large  database,  such  a  scheduling  algorithm 
would  likely  become  a  bottleneck.2  MRF  and  MRFL  algo¬ 
rithms  were  introduced  as  lower-overhead  alternatives. 

The  results  of  Figure  2  agree  with  those  of  [Dyke86], 
[Wong88]  except  for  two  key  points.  First,  the  earlier  work 
did  not  investigate  the  performance  of  the  algorithms  under 
very  high  loads,  so  it  did  not  identify  the  flattening  of  the 
performance  curves  for  all  of  the  algorithms  under  high  load. 
Second,  in  the  earlier  study  MRFL  was  seen  to  provide  a 
performance  between  that  of  FCFS  and  LWF.  Thus,  MRFL 
was  proposed  as  a  lower-overhead  replacement  for  LW  F.  In 
contrast,  our  results  show  that  MRFL  has  poor  performance 
relative  to  LWF  and  thus,  is  not  a  reasonable  replacement. 
The  differences  in  the  conclusions  stem  from  the  fact  that 
the  earlier  study  was  performed  using  a  very  small  database 
(100  items,  compared  to  10000  in  Figure  2).  As  the  size  of 
the  database  increases,  the  probability  of  having  a  tie  for  the 
largest  number  of  requests  diminishes.  Without  ties,  MRFL 
degenerates  to  MRF,  and  has  relatively  poor  performance. 

The  poor  performance  of  MRFL  for  larger  systems  has 
also  recently  been  shown  by  Su  and  Tassiulas  [Su97].  In  that 
paper,  they  propose  an  alternative  algorithm,  called  PIP-0. o 
(Priority  Index  Policy),  that  performs  as  well  as  LWF  in  av¬ 
erage  wait  time.  Unlike  LW  F,  PIP-0.5  can  be  implemented 
with  an  0(1)  complexity  for  choosing  the  next  page  to  broad¬ 
cast.  PIP-0.5  falls  short  of  our  performance  criteria,  because 
it  is  based  on  estimates  of  the  probability  of  access  for  each 
item.  As  a  result,  its  usefulness  is  limited  to  fairly  stable 
environments  where  those  probabilities  do  not  often  change 
significantly.  Furthermore,  the  history  mechanism  that  must 
be  employed  to  obtain  such  probability  estimates  can  re¬ 
sult  in  additional  overhead,  particularly  for  very  large  data 
sets.  Algorithms  based  on  access  probabilities  and  broadcast 
histories  have  also  been  proposed  by  Vaidya  and  Hameed 
[Vaidya96].  These  algorithms  have  similar  performance  to 
the  PIP-0.5  algorithm,  and  also  share  that  algorithm’s  limi¬ 
tations  in  terms  of  robustness  to  changing  workloads. 

III.  RxW:  A  Parameterized  Algorithm 

We  now  describe  a  new  broadcast  scheduling  algorithm, 
called  RxW ,  which  is  a  practical,  low-overhead,  scalable  ap¬ 
proach  that  provides  excellent  performance  across  a  range  of 
scenarios. 

2In  our  implementation  of  LWF,  we  found  that  using  one  processor  of 
a  DEC  Alpha  2100  4/275  server  and  assuming  a  broadcast  bandwidth  of 
155.52  Mbps,  the  LWF  algorithm  became  a  bottleneck  with  a  database 
size  of  5543  8KByte  pages. 
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A.  Intuition 

The  results  shown  in  Figure  2  demonstrated  that  MRF 
and  FCFS  have  poor  average  case  performance  compared  to 
the  higher-overhead  LWF  algorithm.  Probing  more  deeply 
into  their  performance  in  this  case  leads  to  a  very  important 
observation.  Figures  3(a)  and  3(b)  show  the  performance  of 
the  three  algorithms  for  the  109?  most  popular  pages  in  the 
database  (i.e..  the  "hot"  pages),  and  the  remaining  907c  of 
the  pages  (i.e..  the  "cold”  pages)  respectively.3 

As  can  be  seen  in  the  figures.  MRF  provides  the  lowest 
waiting  time  for  hot  pages,  but  its  performance  for  cold 
pages  is  by  far  the  worst  of  the  three  algorithms.  In  con¬ 
trast,  FCFS  provides  similar  performance  for  both  classes 
of  pages  leading  it  to  have  the  worst  performance  for  hot 
pages  and  the  best  for  cold  pages  of  the  three.  MRF  chooses 
the  page  with  the  highest  number  of  outstanding  requests, 
so  that  requests  for  infrequently  accessed  pages  must  wait 
until  sufficient  requests  have  arrived.  Since  MRF  is  not  a 
starvation-free  algorithm;  it  is  quite  possible  that  a  request 
for  a  very  cold  page  is  never  satisfied.  In  contrast,  FCFS  is 
a  fair  algorithm  in  which  the  maximum  time  a  request  must 
wait  is  the  same  for  all  pages.  This  behavior  causes  it  to 
spend  more  bandwidth  on  requests  for  cold  pages.  The  fact 
that  both  algorithms  favor  one  class  of  pages  over  the  other 
results  in  their  both  having  poor  performance  on  average.  In 
contrast.  LWF  provides  good  performance  for  both  types  of 
pages,  resulting  in  better  average  performance  overall.  Based 
on  these  observations,  we  set  out  to  combine  the  two  low- 
overhead  approaches  (MRF  and  FCFS)  in  a  way  that  would 
balance  their  strengths  and  weaknesses  and  provide  a  more 
even-handed  treatment  of  hot  and  cold  pages. 

B.  Scheduling  with  /fall' 

B.l  The  Exhaustive  Algorithm 

We  have  developed  a  new  scheduling  algorithm  called 
RxW  (Requests  times  Wait),  which  provides  good  perfor¬ 
mance  by  combining  the  benefits  of  MRF  and  FCFS  in  a 
way  that  ensures  scalability  by  having  low  overhead.  RxW 
broadcasts  a  page  either  because  it  is  very  popular  or  because 
it  has  at  least  one  long-outstanding  request. 

RxW  maintains  a  service  queue  structure  with  a  single 
entry  for  each  page  that  has  outstanding  requests.  Entries 
contain  a  page  identifier  (PID),  the  count  of  the  number 
of  outstanding  requests  (REQcnt),  and  a  time-stamp  of 
the  earliest  unsatisfied  request  for  the  item  (1st  ARV).  This 
structure  is  hashed  on  PID. 

The  exhaustive  algorithm  works  as  follows:  When  a  re¬ 
quest  arrives  at  the  server,  the  server  performs  a  hash  look 
up  to  find  the  entry  of  the  requested  page.  If  it  finds  an 
entry,  then  it  simply  increments  the  REQcnt  for  that  entry. 
If  no  entry  is  found,  then  a  new  one  is  created  with  REQcnt 
initialized  to  1,  and  1st  ARV  initialized  to  the  current  time. 
At  each  broadcast  tick,  the  server  chooses  to  broadcast  the 
page  with  the  largest  value  of  ( R  *  W)  with  R  =  REQcnt 

s  Recall  that  the  popularity  of  the  pages  for  this  case  were  generated 
according  to  the  Zipf  distribution. 


and  W  =  c/ocl-lstARV.  where  clock  is  the  current  time 
in  broadcast  ticks.  The  entry  for  this  page  is  then  removed 
from  the  structure.  The  exhaustive  algorithm  finds  the  page 
with  the  maximum  value  by  simply  performing  a  linear  scan 
of  all  the  entries.  Note  that  the  queue  size  is  limited  by 
N,  the  number  of  pages  in  the  database.  Thus,  the  exhaus¬ 
tive  RxW  algorithm  is  similar  in  overhead  to  other  proposed 
O(N)  algorithms,  such  as  [Yaidya96].  Note  however,  that 
RxW  makes  decisions  based  only  on  the  current  state  of 
outstanding  requests,  and  does  not  depend  on  estimates  of 
page  access  probabilities.  A  detailed  analytical  study  of  the 
average  waiting  time  and  the  limiting  behavior  of  RxW  is 
provided  in  [Aksoy97]. 

B.2  The  Pruning  Algorithm 

As  with  other  O(N)  scheduling  algorithms,  the  overhead  of 
RxW  scheduling  can  be  reduced  by  performing  more  work 
during  request  processing  in  order  to  keep  the  request  in¬ 
formation  better  organized.  In  order  to  avoid  searching  the 
entire  list  of  pages  with  outstanding  requests,  we  thread  two 
sorted  lists  through  the  request  queue  structure:  The  Wait 
list  is  simply  a  FCFS  queue  based  on  IstARV  in  ascend¬ 
ing  order;  the  second  list  is  the  Requests  list,  which  is  kept 
sorted  in  descending  order  by  REQcnt. 

The  Wait  queue  is  maintained  by  simply  appending  a  re¬ 
quest  entry  to  it  when  a  request  arrives  for  a  page  with  no 
outstanding  requests  and  removing  a  page’s  entry  when  the 
page  is  broadcast.  The  Requests  list  is  maintained  each 
time  a  request  is  received.  This  maintenance  involves  mov¬ 
ing  the  affected  entry  to  the  proper  place  in  the  sorted  list. 
We  introduce  an  additional  st  ructure  to  speed  up  the  mainte¬ 
nance  process  and  to  guarantee  scalability  in  request  arrival 
rates.  This  structure,  called  the  “REQcnt.  index’  contains 
pointers  to  each  cluster  of  REQcnt  value,  i.e.  all  pages 
that  have  the  same  number  of  pending  requests.  Values  are 
added  to  and  removed  from  this  index  as  the  request  struc¬ 
ture  evolves.  Using  this  index,  request  processing  is  an  0(1) 
operation  (note  that  for  any  one  request,  an  entry  moves 
exactly  one  cluster  up  in  the  Requests  list). 


K«quests  W  »it 


The  two  sorted  lists,  (Requests  and  Wait),  are  used  to 
prune  parts  of  each  list  where  there  is  no  possibility  of  con¬ 
taining  the  entry  with  the  highest  RxW  value.  This  pruning 
technique  is  depicted  in  Figure  4.  Note  that  nodes  connected 
by  a  dashed  line  belong  to  a  single  service  queue  entry.  On 
the  left  is  the  Requests  list,  ordered  by  descending  REQcnt 
and  on  the  right  is  the  Wait  queue,  ordered  by  ascending 
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IstARV.  When  an  entry  is  examined  from  the  Requests 
queue,  it  is  known  that  all  entries  not  yet  scanned  have  a 
REQcnt  field  that  is  less  than  or  equal  to  that  of  the  entry. 
Thus,  if  the  recently  examined  entry  has  values  REQcnt' 
and  IstARV.  the  only  way  for  a  remaining  entry  to  beat 
the  current  maximum  RxH  value  seen  so  far  (MAX),  is  if 

MAX 

clock  -  1  stAR\  >  R£Qcnt, 

where  clock  is  the  current  time.  Thus,  the  entries  that  must 
be  searched  on  the  Wait  queue  are  bounded  by: 

,  ,  ,  MAX  , 
limit(lstARV)  =  min(lstARV  , clock  -  REQcnfl> 

The  same  kind  of  pruning  can  be  applied  on  the  Requests 
list  by  scanning  entries  on  the  Wait  queue.  The  pruning 
algorithm  starts  from  the  top  of  the  Request  list  (thereby 
truncating  the  Wait  queue)  and  then  examining  the  top  of 
the  Wait  queue  (thereby  truncating  the  Requests  list)  and 
alternating  until  the  search  reaches  the  bottom  of  one  of  the 
(truncated)  lists.  Note  that  the  stopping  condition  is  checked 
merely  by  comparing  the  limit  values,  rather  than  actually 
maintaining  a  pointer  to  the  exact  boundary  on  the  list. 

This  mechanism  prunes  the  search  space  while  still  guaran¬ 
teeing  that  the  search  will  return  the  page  with  the  maximum 
RxW  value.  Thus,  in  the  performance  study  that  follows,  we 
ignore  the  exhaustive  algorithm  and  use  the  term  “ RxW  al¬ 
gorithm”  to  refer  to  this  pruning  variant. 

B.3  Approximation  for  Speed  Up 

Scheduling  based  on  the  highest  RxW  value  makes  effi¬ 
cient  use  of  broadcast  bandwidth.  Such  efficiency,  however, 
comes  at  a  price  in  terms  of  the  overhead  that  is  incurred 
at  each  scheduling,  decision.  As  stateo  in  Section  II-B,  such 
overhead  can  ultimately  limit  scalability  in  terms  of  support¬ 
able  bandwidth  or  database  size.  The  pruning  technique  for 
RxW  aims  to  reduce  this  overhead.  As  will  be  shown  in 
Section  IV-C,  this  technique  is  indeed  effective  —  for  exam¬ 
ple,  in  the  main  workload  of  our  experiments  this  pruning 


resulted  in  a  72%  savings  in  terms  of  the  number  of  entries 
searched  in  order  to  find  the  maximum  RxW-valued  page. 
While  such  a  substantial  savings  is  helpful,  it  is  probably  not 
sufficient  to  keep  the  scheduling  overhead  from  ultimately  be¬ 
coming  a  limiting  factor  as  the  system  is  scaled  to  the  huge 
applications  that  will  be  enabled  by  the  national  and  global 
broadcasting  systems  currently  being  deployed. 

Based  on  insight  gained  from  early  experiments  with  the 
RxW  algorithm,  we  realized  that  the  scheduling  overhead 
can  be  reduced  dramatically  by  backing  off  from  the  re¬ 
quirement  of  maximality  for  the  RxW  value  when  making 
scheduling  decisions.  We  have  developed  an  approximate. 
parameterized  variant  of  .RxW7  that  allows  the  search  space 
to  be  reduced  further,  at  the  possible  expense  of  making  less 
efficient  use  of  the  broadcast  bandwidth.  By  varying  a  sin¬ 
gle  parameter,  the  algorithm  can  be  tuned  from  having  the 
same  behavior  of  the  RxW  algorithm  described  so  far,  to  a 
constant  time  approach  that  provides  maximal  scalability. 

The  parameterized  version  of  RxW  is  based  on  two  in¬ 
sights  about  RxW  scheduling.  First,  we  found  that  with 
highly  skewed  access  patterns  (as  would  be  expected  in  many 
dissemination-oriented  applications),  the  page  with  the  max¬ 
imum  RxW  value  is  typically  found  very  near  the  top  of  at 
least  one  of  the  two  lists  (Requests  or  Wait).  As  a  re¬ 
sult,  even  the  pruning-based  RxW  algorithm  can  spend  sub¬ 
stantial  resources  examining  entries  after  it  has  already  en¬ 
countered  the  maximum-valued  entry.  The  second  insight  is 
that  given  a  static  workload  (i.e.,  in  terms  of  request  arrival 
rate  and  access  probability  distribution)  the  average  RxW 
value  of  the  page  chosen  for  broadcast  typically  converges 
to  some  value.  This  latter  insight  is  exploited  to  create  a 
self-adapting  approximation  algorithm  based  on  the  RxW 
value  of  the  most  recently  broadcast  page.  We  take  care, 
however,  to  ensure  that  the  approximation  works  well  even 
in  the  presence  of  a  changing  workload. 

The  approximation  algorithm  requires  a  single  parameter 
called  accuracy,  which  can  be  set  to  any  value,  0  or  greater. 

4  Typically  the  accuracy  parameter  will  be  set  to  a  value  between  0  and 
I,  Larger  values  can  be  used,  however.  In  the  limit,  setting  accuracy 
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Scheduling  works  as  follows:  First,  the  algorithm  maintains 
a  self-adapting  threshold,  which  is  updated  on  every  broad¬ 
cast  tick  by  averaging  the  current  threshold  value  with  the 
Rx\Y  value  of  the  page  that  was  chosen  to  be  broadcast  on 
that  tick.  To  find  the  next  page  to  broadcast  the  request 
structure  is  searched  as  in  the  regular  (pruning)  RxW  al¬ 
gorithm.  but  rather  than  searching  for  the  page  with  the 
maximal  RxW  value,  the  algorithm  chooses  the  first  page  it 
encounters  whose  RxW  value  that  is  greater  than  or  equal 
to  accuracy  x  threshold.  If  no  such  page  is  found,  then  the 
algorithm  acts  like  the  regular  RxW  algorithm  and  returns 
the  page  with  the  maximum  RxW  value. 

The  setting  of  the  accuracy  parameter  determines  the  per¬ 
formance  tradeoffs  between  average  waiting  time,  worst  case 
waiting  time,  and  scheduling  overhead.  The  smaller  the  value 
of  the  parameter,  the  fewer  entries  are  likely  to  be  scanned. 
At  an  extreme  value  of  0.  the  algorithm  simply  compares  the 
top  entry  from  both  the  Requests  list  and  the  Wait  queue 
and  chooses  the  one  with  the  highest  RxW  value.  In  this 
case,  the  complexity  of  making  a  scheduling  decision  is  re¬ 
duced  to  0(1),  ensuring  that  broadcast  scheduling  will  not 
become  a  bottleneck  regardless  of  the  broadcast  bandwidth, 
database  size,  or  workload  intensity.  In  the  following  sec¬ 
tion,  we  examine  the  performance  tradeoffs  of  several  settings 
for  the  accuracy  parameter.  V* e  refer  to  the  approximation- 
based  RxW  algorithms  as  RxW.o ,  where  a  equals  the  value 
of  the  accuracy  parameter  as  a  percentage  (e.g.,  an  a  setting 
of  0.80  is  called  /?xU  .80). 


IV.  Experimental  Results 


A.  Simulation  Environment 

Our  experiments  were  performed  using  a  simple  simulation 
model  of  the  system  using  CSIM  [Schw86].  As  with  previous 
studies,  the  model  is  intended  to  capture  only  the  quality  of 
the  schedule  produced  by  the  given  scheduling  algorithms. 
As  such,  it  does  not  include  the  overheads  of  scheduling  and 
request  processing  at  the  server.  These  costs  have  been  de¬ 
scribed  in  the  previous  sections  and  are  addressed  in  the 
scalability  portion  of  the  experiments  (Section  IV-C).  Also 
in  keeping  with  earlier  studies,  we  do  not  model  the  costs  of 
using  the  back-channel  for  sending  requests  from  the  clients 
to  the  server  as  these  costs  will  be  the  same  for  all  of  the 
scheduling  algorithms.5  The  broadcast  channel  is  modeled 
as  a  server  with  a  fixed  rate  of  broadcast.  We  do  not  specify 
an  absolute  value  for  this  rate,  but  rather,  use  broadcast  ticks 
as  our  measure  of  time.  This  approach  emphasizes  that  the 
results  are  not  limited  to  any  particular  bandwidth  and/or 
data  item  size,  but  rather,  that  they  describe  fundamental 
tradeoffs  among  the  algorithms. 

In  the  model,  the  client  population  is  represented  by  a  sin¬ 
gle  request  stream.  The  client  population  model  generates 
non-blocking  requests  with  exponential  inter-arrival  times 


to  oo  results  in  the  approximate  algorithm  behaving  identically  to  the 
regular  pruning-based  Rx W  algorithm. 

5  The  cost  of  back-channel  requests  becomes  more  important  when 
trading  off  between  server  push  and  client  pull  over  the  broadcast  as 
in  [Acha97]. 


with  mean  A.  We  use  an  open  system  model  since  our  work  is 
aimed  at  supporting  extremely  large,  highly  dynamic  client 
populations,  and  such  client  populations  cannot  be  realisti¬ 
cally  modeled  with  a  closed  simulation  system.  The  request 
pattern  is  shaped  with  a  Zipf  distribution  [Knuthbl].  This 
is  a  frequently  used  distribution  for  non-uniform  data  access. 

It  produces  access  patterns  that  become  increasingly  skewed 
as  its  9  parameter  increases  from  0  (uniform  access  proba¬ 
bility)  to  1  (highly  skewed).  The  offset  and  freq  parameters 
are  used  to  simulate  interest  shifts  of  the  client  population 
and  the  frequency  of  such  shifts.  The  parameters  and  their 
settings  are  summarized  in  Table  I. 

B.  Responsiveness 

In  the  first  experiment,  we  examine  the  responsiveness  of 
several  variants  of  the  RxW  algorithm,  and  compare  them 
to  the  LWF  and  FCFS  algorithms  of  Dykeman  et  al.  As 
stated  previously,  the  results  we  present  here  are  measures 
of  the  quality  of  the  scheduling  choices  made  by  the  various 
algorithms,  and  do  not  take  into  account  the  overhead  of 
scheduling  and  request  processing.  Under  such  assumptions, 
LWF  (and  similar  algorithms  such  as  PIP-0.5)  have  provided 
the  best  average  case  performance  in  previous  studies.  We 
report  results  for  four  variants  of  RxW:  The  pruning  algo¬ 
rithm  and  the  approximate  algorithm  with  a  values  of  0.90, 
0.80,  and  0  (referred  to  as  RxW. 90,  RxW .&0,andRx\\  .0, 
respectively).  Recall  that  RxW.O  examines  only  the  top  en¬ 
try  of  each  the  two  sorted  lists  of  requests,  and  thus,  makes 
scheduling  decisions  in  constant  time.  The  four  variants  of 
RxW  allow  us  to  investigate  the  tradeoffs  between  the  exact 
and  approximate  approach,  among  the  various  a  settings. 

In  Figure  5(a),  we  plot  the  average  waiting  time  for  each 
scheduling  algorithm,  as  the  mean  request  arrival  rate  is  var¬ 
ied  from  1  per  tick  to  1000  per  tick  along  the  x-axis  (shown 
with  a  log  scale).  All  algorithms  exhibit  similar  performance 
here,  with  the  average  wait  time  increasing  but  ultimately 
leveling  off  as  the  request  arrival  rate  is  increased.  This 
leveling  off  is  a  characteristic  of  broadcast  data  delivery  to 
clients  with  shared  interests  and  differs  dramatically  from 
what  would  be  expected  in  a  unicast  environment. 

Comparing  the  six  algorithms,  it  can  be  seen  that  LWF 
and  the  RxW  algorithm  provide  the  best  average  perfor¬ 
mance  {RxW  even  does  slightly  better  for  loads  between  5 
and  50  requests/tick).  The  good  performance  of  RxW  in  this 
case  demonstrates  that  the  scheduling  decision  metric  used 
by  RxW  is  a  reasonable  substitute  for  that  of  LWF  (the  to¬ 
tal  waiting  time),  and  even  for  perfect  knowledge  of  access 
probabilities  (as  used  by  PIP-0.5,  which  ajso  has  similar  per¬ 
formance  to  LWF).  By  far  the  slowest  average  performance 
in  this  case  is  provided  by  FCFS.  As  described  in  Section  III- 
A,  for  a  sufficiently  loaded  system  FCFS  allocates  the  same 
bandwidth  to  all  accessed  pages,  regardless  of  their  popular¬ 
ity,  resulting  in  poor  utilization  of  the  broadcast. 

The  results  for  the  approximate  RxW  algorithm  show  that 
as  the  q  parameter,  which  sets  the  accuracy  for  the  broad¬ 
cast  schedule,  is  decreased  the  average  wait  time  increases. 
For  all  three  of  the  values  shown,  however,  the  performance 
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is  significantly  better  than  that  of  FCFS.  The  approximate 
algorithm  with  an  a  value  of  0.90  remains  less  than  10% 
slower  than  the  maximal  algorithm.  Even  the  constant-time 
RxW.O  pays  less  than  a  33%  penalty  compared  to  the  max¬ 
imal  algorithm  in  the  most  extreme  case  here. 

Figure  5(b)  shows  the  worst  case  waiting  time  measured 
for  the  same  experiment  as  Figure  5(a).  That  is,  we  plot 
the  longest  measured  wait  for  any  request  that  occurs  dur¬ 
ing  the  simulation  run.  Note  that  the  simulation  was  run 
one  million  broadcast  ticks,  so  each  page  was  broadcast  at 
least  several  times.  Although  worst  case  performance  has 
not  been  addressed  by  previous  studies,  it  is  an  important 
metric  for  many  applications.  As  can  be  seen  in  the  figure, 
the  ordering  of  the  algorithms  for  worst  case  behavior  is  in¬ 
verted  compared  to  the  average  case.  FCFS  has  the  shortest 
worst  case  waiting  time.  In  fact,  once  a  page  has  been  re¬ 
quested,  it  is  guaranteed  to  be  scheduled  for  broadcast  before 
any  other  page  is  broadcast  twice.  Thus,  its  worst  case  be¬ 
havior  is  bounded  by  the  number  of  pages  (10,000  in  this 
case).  In  contrast,  the  LWF  and  RxW  algorithms  make  no 
such  guarantees  —  popular  pages  may  be  broadcast  multiple 
times  while  requests  for  a  less  popular  page  are  waiting. 

For  the  approximate  RxW  algorithm,  a  serves  as  a  knob 
for  adjusting  the  worst  case  waiting  time  in  the  opposite 
direction  of  how  it  works  for  the  average  case.  As  a  is  de¬ 
creased,  the  FCFS  queue  begins  to  play  a  larger  role  in  the 
scheduling  process,  and  thus,  the  behavior  begins  to  look 
more  like  FCFS.  As  a  result,  for  a  =  0,  the  worst-case  behav¬ 
ior  is  within  15%  of  that  of  FCFS.  Comparing  Figures  5(a) 
and  (b),  it  is  apparent  that  the  a  parameter  provides  a  flex¬ 


ible  mechanism  for  trading-off  worst  case  and  average  case 
waiting  times  for  a  particular  application  environment,  and 
that  it  can  also  be  set  to  balance  both  concerns  reasonably 
well  (e.g.,  a  =  0.80  in  this  case).  In  the  next  section  we 
show  that  a  can  also  be  used  to  adjust  the  overhead  of  the 
scheduling  decision  process  in  order  improve  scalability. 

C.  Scheduling  Overhead 

As  described  previously,  a  critical  aspect  of  scheduling 
algorithms  for  large-scale  data  broadcasting  is  scalability. 
The  previous  section  focused  on  the  performance  of  the  al¬ 
gorithms  in  an  ideal  setting  where  there  was  no  overhead 
for  making  scheduling  decisions  or  processing  requests.  In 
practice,  however,  such  concerns  can  limit  the  ability  for  on- 
demand  systems  to  support  large  applications.  As  described 
in  Sections  II-C  and  III,  all  of  the  algorithms  are  fairly  effi¬ 
cient  in  terms  of  request  processing.  They  differ  significantly, 
however,  in  terms  of  scheduling  overhead.  In  this  section  we 
examine  the  question  of  scheduling  overhead  in  more  detail. 

Figures  6(a)  and  (b)  show  the  average  number  of  request 
queue  entries  searched  each  time  a  scheduling  decision  is  to 
be  made  (i.e.,  on  each  broadcast  tick),  as  the  request  arrival 
rate  and  the  database  size  are  increased,  respectively.  Fig¬ 
ure  6(a)  corresponds  to  the  same  settings  as  the  previous  two 
graphs;  that  is,  the  database  size  is  fixed  at  10,000  pages  and 
the  request  arrival  rate  is  varied  from  1  to  1000  requests/tick. 
As  can  be  seen  in  the  figure,  LWF  is  by  far  the  most  expen¬ 
sive  of  the  algorithms  shown  for  making  scheduling  decisions, 
followed  by  the  maximal  RxW.  Also,  it  can  be  seen  that  the 
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approximate  version  of  Rx\\  provides  tremendous  savings. 

L\\'F  is  an  exhaustive  algorithm,  and  under  high  loads, 
there  is  at  least  one  pending  request  for  each  data  page. 
Thus,  the  scheduling  cost  of  LW  F  is  proportional  to  the  num¬ 
ber  of  distinct  pages  that  are  accessed  by  the  client  popula¬ 
tion,  namely  10.000  pages  in  this  case.  For  a  fast  broadcast 
bandwidth  and/or  large  database  size,  the  scheduling  over¬ 
head  of  LW'F  could  easily  become  a  bottleneck.  The  maximal 
rx\Y  algorithm,  on  the  other  hand,  examines  significantly 
fewer  queue  entries  for  each  scheduling  decision.  In  this  case, 
it  examines  2729  entries  on  average  at  a  load  of  1000  re- 
quests/tick:  a  savings  of  about  72.79?  It  is  important  to  note 
that  as  was  shown  in  Figures  5(a)  and  (b).  these  savings  in 
search  complexity  come  at  no  cost  in  broadcast  efficiency. 

The  savings  provided  by  Rx U ’s  pruning  algorithm,  how¬ 
ever,  are  dwarfed  by  the  tremendous  savings  provided  by 
the  approximate  version  of  the  algorithm.  In  figure  5(a)  at 
a  load  of  1000  requests/tick,  Rx\Y M  and  RxW  .SO  exam¬ 
ine  116  and  39  entries  respectively,  for  savings  of  more  than 
98.89?  and  99.6%  respectively.  With  o  set  to  0,  the  approx¬ 
imate  algorithm  scans  only  two  entries  providing  maximum 
scalability  in  terms  of  search  overhead. 

Figure  6(b)  shows  even  more  striking  results  for  the 
same  algorithms,  when  the  request  rate  is  fixed  at  100  re¬ 
quests/tick  and  the  database  is  scaled  from  1  to  50,000  pages. 
The  overhead  of  LWF  grows  linearly  with  the  database  size, 
approaching  the  limit  of  one  entry  per  page.  The  overhead 
of  maximal  RxW  also  grows  linearly,  but  at  a  much  slower 
rate.  Finally,  similarly  to  part  (a)  of  the  figure,  the  over¬ 
head  of  the  approximate  algorithms  grows  much  more  slowly, 
with  RxW.O  remaining  constant.  The  practical  impact  of 
these  results  is  that  the  approximate  RxW  algorithm  pro¬ 
vides  tremendous  scalability  in  terms  of  request  arrival  rate 
and  database  size.  In  addition,  although  not  shown  here  di¬ 
rectly,  these  results  indicate  that  RxW  allows  a  broadcast 
system  to  scale  in  terms  of  the  supportable  broadcast  band¬ 
width.  Faster  broadcast  means  shorter  ticks,  and  thus,  less 

•Note  that  FCFS  is  not  shown  on  these  graphs.  It  is  a  0(1)  scheduling 
algorithm  and  so  is  insensitive  to  the  parameters  varied  here. 


time  to  make  scheduling  decisions.  RxW  is  clearly  capable 
of  making  fast  scheduling  decisions  across  a  large  range  of 
svstem  sizes  and  workload  intensities. 

D.  Robustness 

RxW  and  its  approximations  do  not  depend  on  any  long 
term  measurements  or  estimates  of  data  access  probability 
distributions,  which  enables  them  to  easily  adapt  to  changes 
in  workload.  The  approximations,  however,  do  use  a  thresh¬ 
old  value  that  is  dependent  on  previous  performance.  To 
test  the  robustness  of  the  RxW  variants,  we  performed  a 
detailed  sensitivity  analysis.  Due  to  space  constraints,  we 
briefly  summarize  the  results  of  that  analysis  here.  The  re¬ 
sults  are  reported  in  more  detail  in  [Aksoy97], 

In  one  set  of  experiments,  the  skewness  of  the  access  pat¬ 
tern  (i.e.,  6)  was  varied  between  1  (default)  and  0  (a  uni¬ 
form  distribution).  As  the  skew  is  reduced,  all  of  the  algo¬ 
rithms  converge  to  the  same  average  waiting  time  (at  9  - 
0).  The  relative  ordering  of  the  algorithms  studied  remained 
constant.  A  second  set  of  experiments  kept  8  at  1  but  varied 
the  focus  of  interest  (i.e.,  the  most  popular  items)  to  dif¬ 
ferent  parts  of  the  database.  The  results  showed  that  the 
approximate  variants  of  RxW  were  slightly  less  robust  than 
LWF  for  infrequent  shifts  of  interest  but  were  significantly 
more  robust  than  LWF  as  the  frequency  of  interest  shifts  was 
increased.  Finally,  a  third  set  of  experiments  introduced  a 
sudden  20-fold  spike  or  a  20-fold  decrease  in  the  arrival  rate 
of  client  requests.  The  results  showed  that  the  number  of  re¬ 
quest  queue  entries  scanned  by  RxW  and  its  approximations 
still  remained  far  below  that  of  LWF. 

V.  Related  Work 

In  this  paper  we  have  presented  a  new  on-demand  schedul¬ 
ing  algorithm  for  large-scale  data  broadcast.  The  directly 
relevant  previous  work  on  scheduling  algorithms  [Dyke86], 
[Wong88],  [Vaidya96],  [Su97]  has  been  addressed  in  detail  in 
Section  II.  In  addition  to  this  directly  related  work,  there  has 
been  much  recent  interest  in  other  areas  of  data  broadcasting. 
A  taxonomy  of  data  delivery  mechanisms  (including  various 
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forms  of  broadcast)  along  with  a  framework  for  describing 
dissemination-based  systems  is  provided  in  [Fran97],  Some 
recent  applications  of  dissemination-based  systems  include 
information  dissemination  on  the  Internet  [Yan96],  [Best96], 
Advanced  Traveler  Information  Systems  [Shekhar96]  and  dis¬ 
semination  using  satellite  networks  [Dao96]. 

The  Datacycle  Project  [Herm87],  [Bowen9'2]  at  Bellcore 
investigated  the  use  of  a  repetitive  broadcast  medium  for 
database  storage  and  query  processing.  An  other  early  ef¬ 
fort  in  information  broadcasting,  the  Boston  Community 
Information  System  (BCIS)  is  described  in  [Giff90].  BCIS 
broadcast  news  articles  and  information  over  an  FM  channel 
to  clients  with  personal  computers  specially  equipped  with 
radio  receivers.  Recently,  scheduling  techniques  from  the 
real-time  community  have  been  applied  to  data  broadcast 
by  Baruah  and  Bestavros  [Baru96].  The  Broadcast  Disks 
project  [AchaOob]  has  investigated  a  number  of  aspects  of 
data  broadcast  using  periodic  push  including  scheduling  and 
client  caching  [Acha95a],  prefetching  [Acha96]  and  integrat¬ 
ing  push  and  pull  over  a  broadcast  channel  [Acha97].  The 
issue  of  combining  broadcast  push  and  unicast  pull  is  ad¬ 
dressed  in  [Stath97],  The  mobility  group  at  Rutgers  [Imie94] 
has  done  significant  work  on  data  broadcasting  in  mobile 
environments.  A  main  focus  has  been  on  indexing  in  or¬ 
der  to  reduce  power  consumption  at  the  mobile  clients. 
Yiswanathan  [Yis94]  has  studied  integrating  push  and  pull 
for  a  mobile  broadcast  environment. 

VI.  Conclusions 

In  this  paper  we  focused  on  the  challenges  of  large-scale 
on-demand  data  broadcast  introduced  by  high  bandwidth 
broadcasting  media  such  as  satellite  or  cable  networks.  Un¬ 
like  previous  work,  we  began  by  proposing  a  comprehensive 
set  of  performance  criteria  for  scheduling  algorithms.  These 
criteria  include  worst  case  as  well  as  average  response  time, 
three  types  of  scalability,  and  robustness  to  changes  in  the 
nature  and  or  intensity  of  the  workload.  We  then  described 
how  previous  algorithms  fail  in  one  or  more  of  these  criteria. 

W'e  proposed  a  scheduling  algorithm  called  RxW ,  that  pro¬ 
vides  a  balanced  treatment  of  hot  and  cold  pages  in  order  to 
achieve  a  good  overall  performance.  The  algorithm  uses  a 
novel  pruning  technique  to  reduce  the  search  space  for  mak¬ 
ing  broadcast  decisions.  W'hile  the  pruning  was  shown  to  be 
effective,  it  was  observed  that  such  an  algorithm  could  still 
eventually  become  a  bottleneck  for  very  large  applications. 

We  developed  an  approximate,  parameterized  variant  of 
RxW  that  allows  the  search  space  to  be  reduced  further, 
at  the  possible  expense  of  making  less  efficient  use  of  the 
broadcast  bandwidth.  By  varying  a  single  parameter,  the 
algorithm  can  be  tuned  from  the  regular  RxW  algorithm,  to 
a  constant  time  approach  that  provides  maximal  scalability. 
We  demonsiraied  ihe  performance,  scalability,  and  robust¬ 
ness  of  the  different  RxW  variants  through  an  extensive  set 
of  performance  experiments. 

In  terms  of  future  work,  we  plan  to  integrate  the  on- 
demand  scheduling  described  here,  with  push-based  and 
other  forms  of  data  delivery  as  part  of  a  larger  Dissemination- 


Based  Information  Systems  (DBIS)  framework  as  described 
in  [Fran97].  We  also  plan  to  investigate  the  scheduling  of 
broadcast  for  hierarchical  broadcast  environments. 


Acknowledgements 

The  authors  would  like  to  thank  to  Ugur  Cetintemel,  Bjorn 
Thor  Jonsson  and  Mustafa  Uysal  for  their  helpful  comments 
on  an  earlier  draft  of  this  paper. 


References 

[Acha95a]  S.  Aeharya,  R.Alonso,  M.  Franklin,  S.  Zdonik,  “Broadcast 
Disks:  Data  Management  for  Asymmetric  Communication  Environ¬ 
ments",  Proc.  ACM  SIGMOD  Conf.,  San  Jose,  CA,  1995. 

[Acha95b]  S.  Aeharya,  M.  Franklin.  S.  Zdonik,  “Dissemination-based 
Data  Delivery  Using  Broadcast  Disks”,  IEEE  Personal  Communica¬ 
tions,  2(6),  1995. 

[Acha96]  S.  Aeharya,  M.  Franklin,  S.  Zdonik,  “Prefetching  from  a 
Broadcast  Disk".  Proceedings  of  the  International  Conference  on 
Data  Engineering,  New  Orleans,  LA,  Feb  1996. 

[Acha97]  S.  Aeharya.  M.  Franklin,  S.  Zdonik,  “Balancing  Push  and 
Pull  for  Data  Broadcast",  Proc.  ACM  SIGMOD,  May  199*. 

[Aksoy97]  D.Aksoy,  M.Franklin.  “On-Demand  Broadcast  Scheduling”, 
Technical  Report,  CS-TR-3S54,  University  of  Maryland,  1997. 

[Baru96]  S. Baruah,  A.Bestavros,  “Pinwheel  Scheduling  for  Fault- 
tolerant  Broadcast  Disks  in  Real-time  Database  Systems" ,  Technical 
Report  TR- 96-023,  Boston  University,  August  1996. 

[Best96]  A. Bestavros, .C.Cunha,  “Server-initiated  Document  Dissemi¬ 
nation  for  the  WWW”,  IEEE  Data  Engineering  Bulletin,  1996 

[Bowen92]  T.  Bowen.  G.  Gopal,  K.  Lee,  A.  Weinrib,  “The  Datacycle 
Architecture” ,  CACM  Vol  32,  No  12,  December  1992. 

[Dao96]  S.  Dao  and  B.  Perry,  “Information  Dissemination  in  Hybrid 
Satellite/Terrestrial  Networks”.  Data  Engineering,  19(13).  1996. 

[DirecPC]  Hughes  Network  Systems,  kttp://wwu.-.direcpc.com,  July  97. 

[Dyke86]  H.D.  Dvkeman,  M.  Ammar,  J.W.  Wong.  "Scheduling  Algo 
rithms  tor  Videotex  Systems  Under  Broadcast  Delivery  .  IEEE  In¬ 
ternational  Conference  on  Communications,  Toronto.  Canada.  19S6. 

[Fran97]  M.  Franklin,  S.  Zdonik.  "A  Framework  for  Scalable 
Dissemination-Based  Systems ’,  Proc.  ACM  OOPSLA  Conf.,  199,. 

[Giff90]  D.  Gifford,  “Polychannel  Systems  for  Mass  Digital  Communi¬ 
cation",  CACM,  37(10),  199-1. 

[Herm87]  G.Herman,  G.Gopal  K.Lee,  A.Weinrib,  “The  Datacycle  Ar- 
chitecture  for  Very  High  Throughput  Database  Systems  ,  Proc. 
ACM  SIGMOD  Conf.,  San  Francisco,  CA,  May  1987. 

[Hybrid]  High-Speed  Data  Access  for  Teleworkers, 
http:/ /■www. hybrid,  com. 

[Imie94l  T.  Imeilenski,  B.R.  Badrinath,  “Energy  Efficient  Indexing  On 
Air",  Proc.  ACM  SIGMOD  Conf.,  Minneapolis,  MN,  May  1994. 

[Intel94]  Intel  Corporation.  Introduction  to  Intercast  Technology, 
http://www.intCTcasi.com,  1994. 

[KnuthSl]  D.  Knuth,  “The  art  of  Computer  Programming  -  Volume 
II”,  Addison- Wesley,  1981. 

[Schw86]  H.D.  Schewtman,  “CSIM:  A  C-based  Process  Oriented  Sim¬ 
ulation  Language”,  Proc.  of  the  Winter  Simulation  Conf.,  1986. 

[Shekhar96]  S.  Shekhar,  A.  Fetterer  and  D.R.  Liu,  “Genesis:  An  Ap¬ 
proach  to  Data  Dissemination  in  Advanced  Traveler  Information 
Sytems”,  Data  Engineering,  19(3),  1996. 

[Stath97]  K.  Stathatos,  N.  Roussopoulos,  and  J.S.  Baras, ^Adaptive 
Data  Broadcast  in  Hybrid  Networks,  in  Proc.  VLDB,  1997. 

[Su97]  C.J.  Su,  L.Tassiulas.  “Broadcast  Scheduling  for  Information 
Distribution”,  Proc.  IEEE  INFOCOM,  1997. 

[Teledesic]  "Application  of  Teledesic  Corporation  for  a  Low  Earth 
Satellite  System  in  the  Domestic  and  International  Fixed  Satellite 
Services,”  filed  by  Teledesic  Corporation  with  the  Federal  Commu¬ 
nications  Commission,  March  21,  1994. 

[Vaidya96]  N.H.  Vaidya  and  S.  Hameed.  “Data  Broadcast  in  Asymmet¬ 
ric  Wireless  Environments".  Proc.  of  Workshop  on  Satellite-based 
Information  Services  (WOSBIS),  New  York,  November,  1996. 

[Vis94]  S.R.Viswanathan,  “Publishing  in  Wireless  and  Wireline  Envi¬ 
ronments”,  PhD  Thesis,  Rutgers  University,  1994. 

[Wong88]  J.W.  Wong,  "Broadcast  Delivery”,  in  Proceedings  of 
IEEE,pp.  1566-1577,  Dec.  1988. 

[Yan96]  T.  Yan  and  H.  Garcia-Molina,  “Efficient  Dissemination  of  In¬ 
formation  on  the  Internet”,  Data  Engineering,  19(13),  1996. 


18 


Appendix  B 

Data  Staging  for  On-Demand  Broadcast 


19 


Data  Staging  for  On-Demand  Broadcast 


Demet  Aksoy 

University  of  Maryland 
demet  @  cs  .umd.edu 


Michael  J.  Franklin 

University  of  Maryland 
franklin@cs.umd.edu 


Stan  Zdonik 

Brown  University 
sbz@cs.brown.edu 


Abstract — Advances  in  broadcast  technology  and  de¬ 
ployment,  along  with  scalability  concerns  have  made 
wide-area  data  broadcasting  an  increasingly  promis¬ 
ing  data  delivery  alternative  for  large  client  popula¬ 
tions.  As  a  result,  there  has  been  significant  effort 
towards  developing  on-line  scheduling  algorithms  for 
data  broadcast  servers.  To  date,  such  scheduling  al¬ 
gorithms  have  been  aimed  at  optimizing  broadcast 
bandwidth  allocation,  and  have  been  based  on  the  as¬ 
sumption  that  all  data  items  are  readily  available  in 
the  server’s  main  memory.  This  approach  ignores  the 
data  management  issues  that  arise  when  data  items 
need  to  be  fetched  from  secondary  storage  or  from  re¬ 
mote  sites  before  they  can  be  broadcast  Such  data 
staging  concerns,  if  ignored,  can  result  in  significant 
degradation  of  the  broadcast  efficiency.  In  this  paper 
we  propose  three  data  staging  solutions:  opportunistic 
scheduling,  server  caching,  and  prefetching,  that  are 
closely  integrated  with  the  RxW  broadcast  scheduling 
algorithm  [AF98].  We  then  use  a  data  broadcasting 
testbed  based  on  IP-Multicast  to  examine  the  perfor¬ 
mance  of  these  various  solutions.  Our  results  show  that 
data  staging  concerns  are  indeed  crucial,  and  that  the 
hints  provided  by  the  RxW  scheduling  algorithm  can 
be  effectively  used  to  dramatically  enhance  the  perfor¬ 
mance  of  a  large-scale  on-demand  broadcast  system. 


I.  Introduction 

A.  Asymmetric  On-Demand  Broadcast 

Advances  in  telecommunications  enable  new 
asymmetric  infrastructures  for  high  speed  data  trans¬ 
mission  rates  using  satellite  networks  or  cable  televi¬ 
sion  networks  [Dir96],  [Web99],  [Cyb99],  [Hom98], 
[Cha98].  The  asymmetry  is  in  the  relative  capacity 
of  the  dedicated  downlink  (from  server  to  clients) 
and  the  dedicated  uplink  (from  server  to  clients). 

This  work  has  been  partially  supported  by  the  NSF  under  grant 
IR1-9501353.  by  Rome  Labs  agreement  number  F30602-97-2- 
0241  under  DARPA  order  number  F078,  and  by  research  grants 
from  Intel  and  NEC. 


Typically,  the  downlink  bandwidth  is  much  higher 
than  that  of  uplink  so  as  to  better  match  the  high 
response  versus  request  data  flow  rate,  e.g.,  a  mouse 
click  of  a  URL  request  versus  the  whole  contents  of 
the  Web  page.  High-bandwidth  links  are  becoming 
available  both  on  terrestrial  cable  networks  and  satel¬ 
lite  networks.  For  example,  the  Teledesic  system  is 
expected  to  provide  bandwidths  of  155.52Mbps  up  to 
1.244Gbps  [Tel94].  Cable  technology  offers  band- 
widths  of  36Mbps  per  channel  for  downstream,  with 
1 10  channels  or  more  [Dat99]. 

Currently  most  infrastructures  are  based  on  unicast 
data  delivery  method  even  though  the  network  inher¬ 
ently  provides  broadcast  capabilities.  With  unicast 
delivery  a  data  item  must  be  transmitted  individ¬ 
ually  to  each  client  that  requests  it.  This  results 
in  serious  scalability  problems  with  increases  in  the 
client  population.  The  client  population  meanwhile 
is  showing  an  enormous  growth  with  the  improve¬ 
ments  of  interconnectivity.  For  instance,  in  1998, 
IntelliQuest  [Int98]  has  reported  62  million  Internet 
users  only  in  US.  When  we  consider  such  large  client 
populations,  the  high  downlink  bandwidth  is  not  suf¬ 
ficient  as  a  solution  with  unicast  data  transfer.  In 
contrast  to  unicast,  broadcast-based  delivery  allows 
a  single  transmission  of  an  item  to  satisfy  all  clients 
that  require  that  item.  Compared  to  traditional  uni¬ 
cast  data  transfer,  broadcast  can,  therefore,  be  much 
more  efficient  for  disseminating  information  to  large 
client  populations,  especially  for  applications  where 
there  exists  a  high  degree  of  commonality  among 
client  interests.  It  should  be  noted  that  broadcast 
data  transfer  on  these  emerging  infrastructures  is  al¬ 
ways  as  good  as  or  better  than  unicast  data  delivery, 
since  the  downlink  channel  is  a  shared  resource  and 
can  only  employ  one  transmission  at  a  time,  i.e.,  can 
not  parallelize  the  transmission  on  multiple  connec¬ 
tions  as  for  point-to-point  infrastructures.  In  this 
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paper  we  focus  on  broadcast-based  data  dissemina¬ 
tion  for  on-demand  data  service  using  the  emerging 
infrastructures. 


Fig.  1.  Example  Data  Broadcasting  Scenario 


An  example  large-scale  on-demand  data  broadcast 
environment  is  shown  in  Figure  1.  In  this  scenario, 
clients  send  requests  for  data  items  to  a  proxy  server 
via  an  independent  uplink  channel.  The  server  re¬ 
ceives  and  aggregates  those  requests  in  a  service 
queue.  Based  on  the  received  requests  the  server 
chooses  data  items  to  broadcast,  and  sends  them  to 
the  clients  over  the  shared  downlink.  Clients  monitor 
the  broadcast  to  receive  the  items  they  are  interested 
in.  This  example  depicts  an  environment  similar 
to  what  could  be  provided  using  a  Direct  Broad¬ 
cast  Satellite  infrastructure  such  as  Hughes  Network 
System’s  DirecPC  [Dir96].  In  this  case,  the  uplink 
is  a  terrestrial,  wired  network  while  the  downlink 
is  a  high-bandwidth  satellite  link.  Other  technolo¬ 
gies  are  of  course,  also  possible.  For  example,  cable 
technology  is  also  being  used  for  data  broadcast¬ 
ing  [Hyb],  and  Internet  multicast  technology  is  im¬ 
proving  [mee92]. 

B.  Broadcast  Scheduling  and  Data  Staging 


utilized.  To  date,  however,  this  work  has  been  based 
on  the  assumption  that  all  data  items  are  readily  avail¬ 
able  in  the  server’s  main  memory  to  be  broadcast 1 
and  has  largely  been  focused  on  optimizing  broadcast 
bandwidth  allocation  with  the  scheduling  decisions 
made. 

In  many  practical  applications,  however,  data  may 
not  be  available  immediately  when  required  by  the 
scheduler.  There  are  many  applications  that  involve 
large  amounts  of  data  that  cannot  be  cost-effectively 
stored  in  main  memory.  Furthermore,  in  a  wide-area 
distributed  system  such  as  the  WWW,  the  items  to 
be  broadcast  are  likely  to  reside  at  a  remote  site.  In 
either  case,  data  items  must  be  retrieved  and  brought 
into  the  server’s  main  memory  before  they  can  be 
broadcast.  The  need  to  fetch  data  from  various  loca¬ 
tions  produces  large  variance  in  service  times,  which 
can  destroy  the  performance  of  traditional  scheduling 
heuristics  and  can  result  in  significant  degradation  of 
broadcast  efficiency.  For  this  reason,  we  have  in¬ 
vestigated  the  coordination  of  broadcast  scheduling 
with  the  management  of  the  data  items  to  be  broad¬ 
cast.  We  refer  to  the  process  of  making  data  items 
available  for  broadcast  as  data  staging. 

C.  Data  Staging  Solutions 

In  this  paper,  we  propose  and  investigate  three 
complementary  approaches  to  data  staging.  All  three 
approaches  are  based  on  a  broadcast  scheduling  algo¬ 
rithm  called  RxW,  which  we  have  previously  shown 
to  be  efficient,  effective,  and  robust  for  a  wide  range 
of  workload  characteristics  [AF98],  [AF99].  Intu¬ 
itively,  RxW  broadcasts  an  item  either  if  there  are 
many  outstanding  requests  for  that  item,  or  if  there 
is  at  least  one  long-outstanding  request  for  that  item. 


A  key  design  consideration  in  the  development  of 
an  on-demand  data  broadcast  system  is  the  schedul¬ 
ing  algorithm  used  by  the  server.  Such  an  algorithm 
aims  to  choose  at  each  instance,  the  most  benefi¬ 
cial  data  item  to  be  broadcast  based  on  the  unful¬ 
filled  requests  that  have  been  received  from  clients. 
There  has  been  significant  work  on  the  development 
of  on-line  scneduiing  algorithms  (e.g.,  [DAW86], 
[Won88],  [VH96],  [ST97b],  [AF98]).  One  main 
objective  of  the  more  recent  studies  has  been  devel¬ 
oping  efficient  algorithms  with  low  overhead  so  that 
the  available  broadcast  bandwidth  can  be  effectively 


RxW  is  described  in  more  detail  in  Section  HI.  The 
three  data  staging  approaches  we  investigate  are  the 
following: 

•  Opportunistic  Scheduling :  It  is  crucial  to  keep 
the  broadcast  busy  in  order  to  fully  exploit 
the  available  downlink  bandwidth.  We  some¬ 
times  broadcast  sub-optimal,  but  memory  resi- 

>The  only  exception  of  which  we  are  aware  is  an  early  study 
by  Dykeman  et  al.  pW88].  As  discussed  in  Section  Vm,  this 
work  is  based  on  a  scheduling  algorithm  that  is  not  suitable  for 
large  systems,  and  includes  solutions  that  require  fine-grained 
control  over  the  location  of  data  on  magnetic  disks. 
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dent  data  items,  when  the  optimal  page  to  broad¬ 
cast  is  being  brought  to  server’s  memory.  We 
investigate  how  to  select  such  sub-optimal  pages 
with  only  small  deviations  from  the  optimal  al¬ 
location  in  the  most  efficient  way. 

•  Caching:  One  obvious  way  to  reduce  the  need 
for  fetching  data  items  is  to  make  the  best  use 
of  the  available  memory  space  on  the  server. 
The  key  to  successful  caching  for  on-demand 
broadcast  servers  is  to  retain  those  items  that  are 
most  likely  to  be  scheduled.  The  RxW  algorithm 
can  provide  very  good  hints  for  identifying  such 
pages  because  it  differentiates  between  popular 
and  not-so-popular  items.  We  exploit  this  prop¬ 
erty  to  make  intelligent  caching  decisions. 

•  Prefetching:  Another  method  to  reduce  access 
latency  is  to  predict  which  items  will  be  broad¬ 
cast  in  the  near  future  and  to  bring  them  into 
the  cache  before  they  are  actually  scheduled  for 
broadcast.  We  examine  prefetching  in  an  in¬ 
tegrated  caching/prefetching  environment  and 
exploit  hints  provided  by  RxW  to  identify  items 
that  are  not  cached  but  are  likely  to  be  broadcast 
in  the  near  future. 

We  have  implemented  a  data  broadcast  testbed  us¬ 
ing  IP-Multicast  on  a  cluster  of  Pentium-based  work¬ 
stations  running  Windows  NT.  We  use  this  prototype 
to  study  the  performance  characteristics  of  the  three 
data  staging  approaches  we  propose.  In  our  ex¬ 
periments,  we  concentrate  mainly  on  disk-resident 
data  sets.  The  data  staging  approaches,  however,  are 
equally  applicable  to  data  residing  at  remote  sites. 
In  order  to  address  this  issue  we  also  examine  the 
effectiveness  of  the  approaches  as  the  latency  for 
obtaining  data  is  increased. 

The  remainder  of  this  paper  is  organized  as  fol¬ 
lows.  In  Section  II  we  give  a  brief  overview  of 
the  mechanism  applied  for  an  integrated  scheduling 
and  data  staging  solution.  In  Section  III  we  briefly 
present  the  RxW  scheduling  algorithm.  This  is  fol¬ 
lowed  by  the  description  of  the  prototype  system 
in  Section  IV  and  experimental  environment  used  to 
evaluate  our  data  staging  techniques.  Sections  V,  VI, 
and  VII  present  the  data  staging  solutions  that  we 
propose  and  analyze  their  performance.  Section  VIE 
discusses  related  work.  Section  IX  presents  our  con¬ 
clusions  and  plans  for  future  research. 


D.  Overview  of  the  Mechanism 

In  this  section  we  briefly  explain  the  way  the  three 
data  staging  solutions  interact  with  each  other  and 
the  role  of  the  scheduling  algorithm  during  the  pro¬ 
cess.  As  stated  in  the  Introduction,  previous  work  on 
on-demand  broadcast  scheduling  did  not  address  the 
issue  of  data  staging.  If  data  staging  is  completely  ig¬ 
nored,  the  server  would  simply  apply  the  scheduling 
algorithm  and  block  on  a  cache  miss  (when  the  sched¬ 
uled  page  is  not  in  memory),  waiting  for  the  sched¬ 
uled  page  to  be  faulted  into  memory.  Obviously, 
such  blocking  would  cause  a  significant  degradation 
of  broadcast  bandwidth  utilization,  resulting  in  poor 
system  performance.  Instead,  the  server  should  ini¬ 
tiate  an  asynchronous  request  to  fetch  the  missed 
page.  We  will  show  that  the  implications  of  such 
asynchronous  requests  are  not  as  straight  forward  as 

in  the  case  of  file  systems. 

We  first  explain  the  server  mechanism  where  each 
data  staging  solution  is  designed  to  complement  each 
other. 


The  integrated  mechanism  is  summarized  in  Fig¬ 
ure  2.  The  user  requests  are  queued  at  the  server 
using  a  single  entry  per  page.  Since  once  a  page  is 
broadcast  all  requests  on  that  page  will  be  satisfied, 
we  do  not  need  to  keep  track  of  multiple  entries  for 
the  same  page  in  the  service  queue.  All  information 
used  by  the  scheduling  algorithm  is  incorporated  into 
this  single  entry  and  is  updated  for  each  additional  re¬ 
quest  made  for  the  same  page.  In  Figure  2  the  server 
continuously  selects  a  page  to  broadcast  from  the 
service  queue.  Initially  the  original  scheduling  algo¬ 
rithm  is  used,  referred  to  as  schedule  in  the  figure.  In 
this  mode,  if  the  scheduled  page  is  in  the  cache  (hit). 
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it  is  immediately  broadcast.  If  the  selected  page  is 
not  in  the  cache  (miss),  an  asynchronous  request  is 
initiated  to  fetch  the  page.  This  request  is  queued 
among  previously  requested  pages,  as  shown  in  the 
miss  queue.  Then  the  page’s  entry  is  removed  from 
the  service  queue.  Later,  when  a  missed  page  arrives 
in  memory,  it  is  broadcast  as  soon  as  the  broadcast 
channel  is  available.  Meanwhile,  as  the  fetch  takes 
place,  the  scheduler  is  run  again  for  the  next  page 
to  broadcast  and  the  original  scheduling  process  is 
repeated  2 . 

We  place  a  limit  on  the  number  of  outstanding 
I/O  requests  in  the  miss  queue  in  order  to  avoid 
I/O  thread  thrashing.  When  this  limit  is  exceeded, 
we  change  the  mode  of  scheduling,  and  apply  the 
opportunistic  scheduling  process,  marked  as  oppor¬ 
tunistic  schedule.  In  this  mode  the  server  selects 
only  cache-resident  pages  for  broadcast  2.  That  is, 
if  the  original  scheduling  was  to  select  a  non-cache- 
resident  page,  the  opportunistic  scheduler  selects  an 
alternative  page  that  can  be  broadcast  because  it  is 
available.  Such  pages  that  are  selected  not  according 
to  the  original  scheduling  algorithm,  but  according  to 
availability  is  shown  as  opportunistic  pages  in  Fig¬ 
ure  2.  A  key  question  for  opportunistic  scheduling 
is  which  cache-resident  pages  to  select  for  broadcast 
during  opportunistic  scheduling  mode.  This  issue 
will  be  examined  in  detail  in  Section  V. 

Another  question  that  we  address  in  this  paper  is 
the  caching  policy  used  by  the  server.  That  is,  when  a 
missed  page  arrives  at  the  memory  how  we  manage 
the  server  cache,  so  that  the  need  to  apply  oppor¬ 
tunistic  scheduling  is  reduced.  In  Section  VI  we  will 
analyze  the  server  caching  policy  that  we  particularly 
propose  for  the  RxW  scheduling  algorithm. 

The  final  front  of  our  attack  to  data  staging  prob¬ 
lem  is  to  bring  the  pages  from  the  request  queue  to  the 
cache  before  they  are  actually  scheduled.  This  pro- 

2 Another  approach  could  be  to  generate  multiple  candidates 
during  the  scheduling  process  in  order  to  ensure  there  is  at  least 
one  page  that  can  be  broadcast  immediately.  This  approach  can 
be  useful  especially  for  high-overhead  scheduling  algorithms. 
In  ou.  case,  since  the  RxW  scheduling  decision  overhead  is  very 
low,  we  take  the  liberty  of  rescheduling  until  a  cache  hit  and 
therefore  use  the  most  current  queue  state  at  each  scheduling 
decision. 

'As  soon  as  the  number  of  outstanding  requests  drops  back 
below  die  limit,  we  switch  back  to  the  original  scheduling  mode. 


cess  is  shown  with  prefetch  in  Figure  2.  The  pages 
that  are  selected  for  prefetch  are  queued  in  a  prefetch 
queue  and,  when  they  arrive,  are  placed  in  a  separate 
part,  of  the  cache,  prefetch  cache.  Later,  when  they 
are  scheduled  the  page  is  moved  to  the  normal  cache 
space  accordingly.  If  a  page  that  is  being  prefetched 
is  scheduled  before  it  arrives  at  server’s  memory,  the 
page  is  marked  to  be  treated  as  a  missed  page  so  that 
it  will  be  placed  in  the  normal  cache  space  and  will 
be  broadcast  as  soon  as  it  arrives.  The  decision  on 
which  pages  to  prefetch  is  examined  in  Section  VII. 

Different  combinations  of  data  staging  solutions 
can  yield  interesting  results  for  a  broadcast  environ¬ 
ment  as  we  will  observe  in  Section  V-B.  In  this  paper, 
we  will  apply  an  incremental  design  in  the  order  of 
opportunistic  scheduling,  server  caching  and  server 
prefetching.  All  three  approaches  are  closely  related 
to  the  scheduling  algorithm  used. 


m.  The  RxW  Scheduling  Algorithm 

In  this  section,  we  briefly  describe  the  RxW 
scheduling  algorithm,  which  serves  as  the  basis  for 
our  integrated  broadcast  scheduling  and  data  stag¬ 
ing  techniques.  RxW  is  a  practical,  low-overhead 
and  scalable  algorithm  that  provides  excellent  per¬ 
formance  across  a  wide  range  of  settings  and  perfor¬ 
mance  criteria  [AF98].  In  this  work,  we  assume  that 
the  data  items  to  be  broadcast  are  fixed-length  and, 
thus,  we  refer  to  them  as  pages.  Scheduling  exten¬ 
sions  to  handle  variable  length  data  have  been  devel¬ 
oped  elsewhere  [VH97],  [ST97a],  [AM98];  similar 
extensions  for  data  staging  are  possible  and  consti¬ 
tute  a  part  of  our  future  work.  As  described  in  [AF98] 
the  best  overall  scheduling  quality  can  be  obtained 
by  an  even-handed  treatment  of  hot  (popular)  and 
cold  (not-so-popular)  pages.  Based  on  this  intuition, 
RxW  schedules  a  page  either  because  it  is  very  pop¬ 
ular  or  because  there  is  one  outstanding  request  that 
has  waited  a  long  time  for  that  page.  At  each  schedul¬ 
ing  decision  the  RxW  algorithm  chooses  to  broadcast 
the  page  with  the  maximal  RxW  value  where  R  is 
the  number  of  outstanding  requests  for  that  page  and 
W  is  the  time  that  the  oldest  outstanding  request  for 
that  page  has  been  waiting. 
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A.  RxW:  Implementation 

Scheduling  is  performed  at  the  server  in  an  on-line 
fashion.  The  server  maintains  a  service  queue  that 
contains  a  single  entry  for  each  page.  Each  request 
entry  carries  all  the  information  that  is  necessary  for 
making  the  scheduling  decision,  namely  the  number 
of  outstanding  request(s)  for  the  page  (R)  and  the 
arrival  time  of  the  oldest  of  those  requests.  The  ar¬ 
rival  time  is  used  to  compute  the  waiting  time  (W)  by 
simply  subtracting  from  current  time  at  each  schedul¬ 
ing  decision.  The  server  maintains  two  sorted  lists 
threaded  through  the  service  queue:  one  based  on 
the  number  of  outstanding  requests  (referred  to  as 
the  R-list)  and  the  other  on  the  waiting  time  of  the 
oldest  request  for  that  page  (referred  to  as  W-list).  4 
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Fig.  3.  Pruning  the  Search  Space 


These  two  sorted  lists  are  used  to  avoid  an  exhaus¬ 
tive  search  of  the  service  queue  when  searching  for 
the  page  with  the  maximal  RxW  value.  The  search 
technique  is  depicted  in  Figure  3.  The  search  starts 
from  the  entry  at  the  top  of  the  R-list  (the  page  with 
the  most  outstanding  requests)  which  corresponds  to 
the  entry  for  page  a  in  Figure  3.  There  are  80  out¬ 
standing  requests  for  page  a  and  the  oldest  of  those 
pages  have  been  waiting  for  10  broadcast  ticks  (time 
needed  to  broadcast  a  single  page).  At  this  point,  the 

Maintaining  these  two  sorted  lists  is  fairly  inexpensive;  The 
arrival  of  a  request  for  a  page  for  which  there  are  currently 
no  outstanding  requests  results  in  the  creation  of  a  new  service 
queue  entry  where  the  R  value  is  set  to  1  and  the  arrival  time  is 
set  to  cument  time.  This  entry  is  appended  to  the  tail  of  the  R-list 
and  the  W-list.  Thereafter,  the  arrival  of  subsequent  requests 
for  that  page  increments  the  R  value  of  the  entry,  and  relocates 
the  entry  in  the  R-list,  but  does  not  impact  the  entry’s  position 
in  the  W-list.  Entries  are  removed  from  both  lists  when  the 
corresponding  page  is  broadcast. 


maximum  R  x  W  value  is  set  to  800.  The  R  value  of 
the  following  page  on  the  R-list,  page  b,  is  then  used 
to  compute  a  lower  limit  on  the  W  value  for  any  page 
that  can  beat  the  current  maximum  RxW  value. 
Since  the  entries  are  sorted  in  descending  order,  we 
know  that  all  entries  that  are  not  yet  scanned  at  this 
point  have  an  R  value  that  is  less  than  or  equal  to  50. 
Therefore  we  know  that  any  entry  that  could  possi¬ 
bly  have  a  higher  RxW  value  must  have  a  W  value 
greater  than  800/50.  This  computation  enables  us 
to  prune  the  W-list,  i.e.,  we  need  not  go  any  further 
beyond  this  limit  on  the  W-list. 

Next,  the  service  queue  entry  for  the  page  at  the  top 
of  the  W-list  (the  entry  with  the  oldest  outstanding 
request)  is  examined  and  similarly,  used  to  place 
a  lower  limit  on  the  R  value.  The  algorithm  then 
keeps  alternating  between  the  two  lists,  raising  the 
limit  on  the  other  list  as  appropriate  and  thereby 
pruning  the  search  space  further.  The  search  stops 
when  the  limit  is  reached  on  one  of  the  lists.  This 
technique  effectively  shrinks  the  search  space  while 
still  guaranteeing  that  the  search  will  return  the  page 
with  the  maximum  RxW  value. 

B.  RxW.a:  A  Parameterized  Approximation 

The  overhead  of  scheduling  can  be  further  reduced 
by  relaxing  the  requirement  of  broadcasting  the  page 
with  the  highest  RxW  value.  We  have  observed 
that  the  maximal  RxW  valued  page  typically  re¬ 
sides  at  a  much  higher  location  than  the  one  at  which 
the  scheduling  process  ultimately  stops;  Most  of  the 
search  is  done  to  guarantee  maximally.  Using  this 
observation,  we  developed  an  approximation-based 
version  of  the  algorithm.  By  varying  a  single  param¬ 
eter  a,  this  approximation  ranges  in  cost  from  that  of 
the  maximal  RxW  algorithm  defined  above  to  a  con¬ 
stant  time  algorithm.  The  approximation  algorithm 
broadcasts  the  first  page  it  encounters  whose  RxW 
value  is  greater  than  or  equal  to  a  times  the  current 
threshold  value.  The  threshold  is  computed  as  the 
running  average  of  the  RxW  value  of  the  last  page 
broadcast  and  the  previous  threshold.  In  cases  where 
no  such  page  is  found,  the  search  proceeds  within  the 
pruned  search  space,  and  the  page  with  the  highest 
RxW  value  is  broadcast.  After  each  broadcast 
decision,  the  threshold  is  updated  accordingly. 

The  setting  of  the  a  parameter  determines  the 
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performance  tradeoffs  among  average  waiting  time, 
worst  case  waiting  time,  and  scheduling  overhead. 
The  smaller  the  value  of  the  parameter,  the  fewer  en¬ 
tries  are  likely  to  be  scanned. 5  For  the  experiments  in 
this  paper  we  use  RxW.90  (i.e.,  a  =  0.9)  as  the  broad¬ 
cast  scheduling  algorithm.  We  have  observed  that 
in  our  experimental  test-bed  configuration,  RxW.90 
provides  a  reasonable  trade  off  between  scheduling 
quality  (i.e.,  closeness  to  the  optimal  bandwidth  al¬ 
location)  and  scheduling  overhead  (i.e.,  the  time  it 
takes  to  make  a  scheduling  decision).  It  should  be 
noted,  however,  that  we  have  tested  our  data  staging 
solutions  using  the  full  RxW  algorithm  and  its  ap¬ 
proximations  with  different  a  values  between  0  and 
1.  Even  though  the  specific  behaviour  of  the  various 
data  staging  approaches  varies  somewhat  for  differ¬ 
ent  approximation  settings,  the  trends  described  in 
this  paper  hold  for  all  cases  tested. 

IV.  Prototype 

As  stated  in  Section  I,  we  have  implemented 
a  testbed  and  used  it  to  study  our  staging  ap¬ 
proaches.  The  prototype  is  implemented  on  a  clus¬ 
ter  of  pentium-based  workstations  running  Windows 
NT  4.0.  Each  machine  has  an  Intel  Pentium  Pro 
200MHz  CPU  and  64  MBs  of  main  memory.  One  of 
these  machines  is  dedicated  as  the  server.  The  pages 
to  be  broadcast  are  all  initially  stored  on  the  server’s 
local  disk.  The  local  disk  of  the  server  is  a  fast  wide 
SCSI  4GB  Seagate  ST32550. 

Each  machine  has  two  independent  Ethernet  con¬ 
nections:  one  for  the  uplink  and  one  for  the  down¬ 
link.  Requests  are  sent  on  a  10Mbps  uplink  and 
the  server  broadcasts  pages  on  a  100  Mbps  down¬ 
link.  The  downlink  employs  UDP  (Unreliable  Data¬ 
gram  Protocol)  for  multicasting  the  data  to  all  of 
the  workstations  in  the  cluster  using  the  IP-multicast 
support  provided  with  Windows  NT  4.0.  During  ex- 

sTVpically  the  a  parameter  will  be  set  to  a  value  between  0 
and  1.  In  the  limit,  setting  a  to  oo  results  in  the  approximation 
algorithm  behaving  identically  to  the  maximal  RxW  algorithm. 
At  an  extreme  value  of  0,  the  algorithm  simply  compares  the  top 
entry  from  both  the  R-List  and  the  W-list  and  chooses  the  one 
with  the  highest  RxW  value.  In  this  case,  the  complexity  of 
making  a  scheduling  decision  is  reduced  to  0(1),  ensuring  that 
broadcast  scheduling  will  not  become  a  bottleneck  regardless 
of  the  broadcast  bandwidth,  database  size,  or  woridoad  inten¬ 
sity  IAF98]. 


periments,  the  testbed  is  isolated  from  any  external 
network  to  avoid  external  network  traffic. 

The  server  has  two  main  responsibilities:  re¬ 
quest  processing  (queuing  new  requests),  and  broad¬ 
cast  management  (making  scheduling  decisions  and 
broadcasting  pages).  Two  threads  running  on  the 
server  perform  these  two  jobs.  To  ensure  that  the 
request  arrival  rate  is  fixed  across  all  algorithms,  the 
request  processing  thread  is  given  top  priority.  In  our 
prototype  server  a  single  CPU  handles  request  pro¬ 
cessing,  and  broadcast  management.  Therefore,  the 
performance  numbers  that  we  present  include  request 
processing  time.  In  order  to  minimize  this  request 
processing  time  that  appears  in  the  performance  re¬ 
sults,  we  avoided  using  actual  messages  from  the 
clients  over  the  uplink.  Instead,  we  pre-generate  the 
request  pattern  for  each  experiment,  and  record  it  on 
the  server  disk.  The  server  then  uses  double  buffer¬ 
ing  to  read  the  requests  such  that  it  fills  up  one  buffer 
as  it  processes  the  requests  in  the  other  buffer.  We 
monitored  the  system  to  verify  that  the  server  always 
finds  the  second  buffer  ready  when  it  consumes  the 
requests  from  the  first  buffer.  All  algorithms  evalu¬ 
ated  experience  exactly  the  same  amount  of  request 
processing.  File  buffering  of  the  operating  system 
is  disabled  throughout  the  experiments  so  as  not  to 
interfere  with  the  data  staging  operations. 

In  the  experiments,  we  first  warm  up  the  server 
cache  and  then  make  sure  equilibrium  is  reached  be¬ 
fore  taking  measurements.  Equilibrium  occurs  when 
the  number  of  outstanding  requests  in  the  system  sta¬ 
bilizes,  i.e.,  the  request  satisfaction  rate  converges  to 
the  request  arrival  rate.  We  use  Little’s  Law  [Tri82] 
to  evaluate  the  average  waiting  time  using  the  logical 
service  queue  length  6. 

V.  Opportunistic  Scheduling 

We  have  introduced  the  mechanism  we  use  to  ap¬ 
ply  integrated  data  staging  solution  in  Section  n. 
Recall  that  we  had  a  limit  on  the  number  of  outstand¬ 
ing  requests  and  we  were  switching  to  opportunistic 
scheduling  mode  when  this  limit  is  reached,  so  that 
only  cache-resident  pages  will  be  scheduled.  In  this 

‘Recall  that  the  length  of  the  physical  queue  maintained  at  the 
server  is  limited  by  the  number  of  pages  in  the  database  and  is 
much  smaller  than  the  logical  queue  length,  i.e.  die  number  of 
outstanding  requests. 
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section  we  describe  the  algorithms  we  apply  for  de¬ 
ciding  on  which  cache-resident  page  to  broadcast  in 
order  to  keep  the  broadcast  channel  utilized  in  the 
opportunistic  scheduling  mode.  We  then  evaluate 
these  algorithms  in  Section  V-B. 

A.  Opportunistic  Scheduling  Algorithms 

We  have  developed  three  algorithms  for  choos¬ 
ing  cache-resident  pages  for  broadcast  during  op¬ 
portunistic  scheduling  mode.  The  first  two  of  these 
require  that  scheduler  be  aware  of  each  page’s  avail¬ 
ability.  This  requires,  the  service  queue  entries  main¬ 
tained  by  the  RxW  be  extended  with  a  flag  that  indi¬ 
cates  whether  or  not  the  corresponding  page  is  cache- 
resident.  This  flag  is  set  when  the  page  is  brought 
into  cache  and  cleared  when  the  page  is  replaced 
from  the  cache.  Following  is  the  description  of  the 
three  approaches: 

•  Best  Cache  Resident  ( OS-BCR):  In  this  algo¬ 
rithm,  we  run  the  scheduler  as  in  the  original 
case.  However,  the  maximum  R  x  W  value  and 
the  limits  on  either  queue  are  updated  only  for 
the  cache  resident  pages.  As  a  result,  we  keep 
track  of  the  best-cache-resident  page  according 
to  the  scheduling  algorithm.  Note  that  when 
we  apply  opportunistic  scheduling,  only  cache- 
resident  pages  are  broadcast  and  therefore  only 
the  corresponding  entries  are  deleted  from  the 
service  queue  and  then  rebuilts  up  from  the  bot¬ 
tom  of  the  lists.  As  a  result,  cache-resident  pages 
appear  at  a  lower  level  than  they  normally  would 
be  in  the  original  scheduling  mode.  Therefore 
we  expect  more  number  of  entries  to  be  scanned 
during  the  search  of  opportunistic  scheduling, 
since  no  matter  how  high  RxW  values  we  ob¬ 
serve  on  non-cache-resident  pages,  the  search 
stops  only  when  the  best  cache-resident  pages 
satisfies  the  stopping  condition. 

.  Earlier  Stop  (OS-ES):  This  algorithm  aims  at 
searching  less  number  of  entries  while  making 
the  opportunistic  scheduling  decision.  It  sim¬ 
ply  runs  the  scheduler  as  usual,  but  keeps  track 
of  two  broadcast  candidates:  best  cache-resident 
page  and  overall-best  page  (including  pages  that 
are  not  cache-resident).  The  search  stops  when 
the  overall-best  page  meets  the  stopping  condi¬ 
tion,  rather  than  the  best  cache-resident  page. 


We  refer  to  this  case  where  the  scheduler  stops 
as  if  all  pages  are  in  the  cache  as  the  original 
application  of  the  algorithm.  At  the  end  of 
this  original  application,  the  best  cache-resident 
page  (among  the  ones  searched  upto  this  point), 
if  any,  is  selected  for  broadcast.  If  no  cache- 
resident  page  is  encountered  at  this  point,  the 
search  continues  until  the  first  cache-resident 
page.  Therefore  less  number  of  entries  need 
to  be  scanned  during  the  search  using  OS-ES. 
Note  that  the  algorithm  does  not  guarantee  that 
the  best  cache-resident  page  (among  all  that  are 
available)  according  to  the  scheduling  criteria  is 
broadcast. 

.  Scan  Cache  (OS-SC):  The  intuition  behind  the 
algorithm  is  to  minimize  the  overhead  involved 
in  selecting  a  cache-resident  page  to  broadcast. 
This  algorithm  does  not  use  the  RxW  scheduling 
algorithm  except  for  the  initial  check  to  see  if 
the  original  application  yields  a  cache  hit.  Oth¬ 
erwise  it  basically  cycles  through  the  pages  in 
the  cache  using  a  pointer,  referred  to  as  next- 
ToBroadcast,  initially  set  to  the  page  at  the  top 
of  the  cache.  The  algorithm  simply  broadcasts 
the  page  pointed  by  nextToBroadcast  if  there  are 
any  outstanding  requests  for  this  page.  Other¬ 
wise  the  pointer  is  advanced  to  the  next  page 
in  cache  until  one  with  outstanding  requests  is 
found  and  broadcast.  After  the  broadcast,  the 
pointer  is  advanced  to  the  next  page  in  order  to 
determine  the  broadcast  candidate  in  the  next 
opportunistic  scheduling.  To  avoid  unexpected 
behavior,  we  also  advance  the  pointer  when  the 
pointed  page  is  replaced  in  the  cache. 

B.  Opportunistic  Scheduling:  Evaluation 

In  this  section  we  experiment  our  approaches  on 
the  test-bed  we  have  implemented  as  described  in 
Section  IV.  We  first  describe  the  workload  we  ap¬ 
plied  in  this  and  subsequent  experiments.  We  then 
report  the  results  of  the  experiment. 

The  workload  we  used  for  the  majority  of  the  ex¬ 
periments  is  based  on  a  Zipf  distribution  with  6  set 
to  1  [Knu81].  The  Zipf  distribution  is  such  that 
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where  p,  is  the  probability  of  accessing  page  i,  N  is 
the  size  of  the  database  and  6  is  the  skewness  parame¬ 
ter.  The  database  consists  of  10000  pages.  All  pages 
are  16K  and  disk-resident.  At  most  100  concurrent 
asynchronous  I/O  requests  are  allowed  during  the 
experiments.  The  cache  size  is  varied  between  5% 
and  100%  of  the  database.  LRU  cache  replacement 
policy  is  used  throughout  these  experiments.  The 
approaches  we  have  developed  are  aimed  at  large- 
scale  systems  with  many  thousands  of  clients.  We, 
therefore,  stress-test  our  prototype  at  an  arrival  rate 
of  1000  requests/sec.  We  have  also  tried  lower  and 
higher  rates  and  we  observed  similar  results  for  op¬ 
portunistic  scheduling  algorithms.  We  describe  the 
effects  on  other  data  staging  approaches  in  corre¬ 
sponding  sections.  We  now  proceed  with  the  evalu¬ 
ation  of  opportunistic  scheduling. 

As  already  discussed  in  Section  V,  blocking  on 
a  cache  miss  is  not  favorable  at  all.  To  see  how 
much  of  a  performance  penalty  such  ignorance  of  the 
data  staging  problem  would  bring,  we  compare  the 
performance  of  opportunistic  scheduling  algorithms 
against  such  a  case  where  data  staging  is  completely 
ignored.  Figure  4  shows  the  performance  results 
of  this  experiment.  In  this  experiment,  the  average 
waiting  time  is  measured  as  the  cache  size  increases 
from  5%  of  the  database  to  100%  of  the  database. 
In  Figure  4  SYNCH  refers  to  the  case  where  the 
server  issues  a  synchronous  I/O  request  for  cache 
misses  and  stalls  until  the  required  page  is  broadcast. 
As  expected  opportunistic  scheduling  (the  bottom 
most  tree  curves  curves)  provides  orders  of  magni¬ 
tude  improvement  across  all  cache  sizes.  For  in¬ 
stance,  at  4000  page  cache  size,  all  opportunistic 
scheduling  approaches  have  17  times  better  perfor¬ 
mance  than  SYNCH.  The  improvement  obtained  by 
opportunistic  scheduling  is  due  to  the  better  band¬ 
width  utilization.  For  instance,  at  4000  page  cache 
size  opportunistic  scheduling  uses  98%  of  the  band¬ 
width  that  could  be  used  (if  all  pages  were  in  the 
cache).  In  contrast  SYNCH  uses  only  22%  of  the 
bandwidth  that  could  be  used.  Note  that  the  relative 
bandwidth  usage  would  be  even  more  significant  for 
latencies  higher  than  secondary  storage,  i.e.,  when 
data  needs  to  be  retrieved  from  remote  sites.  All 
curves  converge  at  10000  page  cache  size,  since  all 
pages  are  in  the  cache  and  there  is  not  any  cache  miss 


that  will  differentiate  the  behavior.  At  this  point  the 
performance  is  fully  determined  by  the  scheduling 
algorithm  used  which  is  the  same  for  all  three. 

Next  we  take  a  closer  look  at  the  opportunistic 
scheduling  algorithms.  Figure  5  zooms  up  the  bot¬ 
tom  left  comer  of  Figure  4.  Here,  we  see  at  OS-SC 
gives  the  best  performance  across  all  ranges.  OS-ES 
gives  the  worst  performance  across  all  ranges  ex¬ 
cept  for  500  page  cache.  All  algorithms  converge 
at  4000  page  cache,  because  then  the  asynchronous 
miss  queue  is  not  full  and  therefore  opportunistic 
scheduling  is  not  applied  for  any  algorithm.  In  the 
figure,  we  observe  a  significant  performance  differ¬ 
ence  between  the  algorithms  at  3000  page  cache,  i.e., 
OS-SC  is  2.3  times  better  than  OS-ES  at  this  point. 
The  performance  differences  come  from  two  fac¬ 
tors:  the  broadcast  efficiency  and  the  broadcast  error 
made.  The  trade  off  between  these  factors  define  the 
performance  of  an  algorithm. 

For  instance,  OS-ES’s  poor  performance  is  mostly 
due  to  the  scheduling  error  made,  even  though  OS¬ 
ES  has  a  quite  good  broadcast  efficiency.  Figure  6 
plots  the  bandwidth  usage  for  the  three  algorithms 
measured  for  the  same  experiment.  The  bandwidth 
usage  shown  does  not  include  any  accompanying 
data  flow  information,  such  as  UDP  packet  head¬ 
ers  etc.  Bandwidth  is  measured  as  the  total  num¬ 
ber  of  data  bytes  broadcast  per  second.  Also  note 
that  the  maximum  bandwidth  usage  is  limited  by 
59Mbps,  even  if  all  pages  are  in  the  cache  due  to 
NT  system  overhead.  In  other  words,  OS-ES  can 
use  at  least  85%  of  the  available  bandwidth  in  the 
whole  range  of  measured  cache  sizes.  On  the  other 
hand,  OS-ES  results  in  a  very  poor  broadcast  sched¬ 
ule.  It  actually  converges  to  always  selecting  the 
cache-resident  page  with  the  most  outstanding  re¬ 
quests  during  opportunistic  scheduling  mode.  This 
kind  of  over  favoring  hot  pages  has  already  been 
shown  to  provide  a  poor  performance  [AF98].  Fig¬ 
ure  7  plots  the  bandwidth  allocation  error  measured 
during  the  experiment.  The  bandwidth  allocation 
error  metric  is  computed  by  comparing  the  resulting 
bandwidth  allocation  generated  by  the  algorithm  ver¬ 
sus  the  optimal  bandwidth  allocation  that  is  defined 
in  [DAW86].  The  optimal  allocation  must  be  done 
in  relative  square  root  ratios  of  page  access  proba¬ 
bilities.  We  measure  the  rate  at  which  each  page  is 
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pig  4.  Average  Wairing  Time  Fig.  5.  Average  Wait  -  Opportunistic  Scheduling  Only 


Fig.  6.  Scheduling  Efficiency  Fig.  7.  Scheduling  Eitot 


broadcast  and  average  the  percentage  of  the  absolute 
error  made  when  compared  to  this  optimal  case.  As 
shown  in  Figure  7  OS-ES  results  in  the  highest  band¬ 
width  allocation  error  among  all  three  algorithms. 

Going  back  to  Figure  5,  OS-BCR  performs  bet¬ 
ter  than  OS-ES  since,  OS-BCR  makes  better  broad¬ 
cast  decisions.  As  shown  in  Figure  7  OS-BCR 
makes  significantly  less  error  when  compared  to  OS¬ 
ES.  Therefore  OS-BCR  performs  better  than  OS¬ 
ES,  even  though  OS-BCR  has  worse  broadcast  effi¬ 
ciency,  i.e.,  slower,  than  OS-ES  since  it  scans  more 
service  queue  entries  to  decide  on  the  cache-resident 
page  to  broadcast.  This  is  due  to  the  fact  that  the 
scheduling  overhead  is  increased  significantly  when 
the  scheduler  is  restricted  to  choose  a  cache  resident 
page.  For  instance,  the  number  of  entries  that  need 
to  be  scanned  increases  2000  fold  for  a  cache  size  of 
500  pages  when  compared  to  OS-ES.  In  Figure  6  OS- 
BCR  is  shown  to  be  the  most  inefficient  algorithm 


among  all  three. 

Finally,  OS-SC  gives  the  best  performance  in  Fig¬ 
ure  5,  because  it  wins  on  both  factors.  OS-SC  has  the 
minimum  overhead  among  all  three  algorithms,  and 
therefore  yields  the  best  bandwidth  usage  as  shown 
in  Figure  6.  In  addition,  OS-SC  yields  a  pretty  good 
broadcast  schedule.  The  cache-resident  pages  that 
are  broadcast  during  opportunistic  scheduling  are 
only  a  small  subset  of  the  pages  that  are  of  inter¬ 
est  to  the  client  population,  and  therefore  a  uniform 
broadcast  of  pages  with  outstanding  requests  does  a 
pretty  good  job  of  scheduling  in  the  overall. 

These  results  demonstrate  that  broadcast  efficiency 
is  just  one  of  many  aspects  that  contribute  to  perfor¬ 
mance  in  a  real  system,  and  that  the  sum  of  the  effects 
of  these  different  components  are  what  determine 
overall  performance.  As  described  in  [AF98],  the 
trade  off  between  different  components  was  in  fact 
the  motivation  behind  the  approximate  versions  of 
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RxW.  Opportunistic  scheduling  solves  the  broadcast  A.  LRU  With  Love/Hate  Hints  (LH) 
efficiency  problem  when  compared  to  synchronous  ^  this  sectjon  we  describe  the  algorithm  that  we 
requests,  however  it  converts  this  efficiency  problem  usg  tQ  jmprove  server’s  cache  management.  Recall 
to  broadcast  error  problem.  that  foWaims  to  provide  a  balanced  treatment  of  hot 


VI.  Server  Cache  Management  for 
Broadcast-Based  Servers 

For  a  highly  skewed  distribution  such  as  the  Zipf, 
RxW  will  broadcast  individual  hot  pages  more  than 
their  colder  counterparts,  but  in  the  overall,  signifi¬ 
cant  bandwidth  will  be  given  to  cold  pages.  In  our 
experiments  we  have  observed  that  approximately 
1/3  of  the  bandwidth  is  expended  broadcasting  the 
top  10%  hottest  pages,  with  the  remaining  2/3  go¬ 
ing  to  the  colder  pages.  This  implies  that  for  small 
cache  sizes,  LRU  is  not  very  useful.  LRU  has  the 
well-known  property  that  once  a  page  is  access 
broadcast  in  our  case  -  it  will  be  placed  at  the  top  of 
the  LRU  stack  and  then  it  has  to  travel  all  the  way 
down  before  it  will  be  replaced  from  the  cache.  This 
implies  that  for  small  cache  sizes:  1)  Cold  pages  will 
always  result  in  a  cache  miss  when  LRU  is  employed, 
since  the  low  frequency  of  cold  page  broadcasts  will 
make  them  the  least  recently  used  page  (they  travel 
all  the  way  to  the  tail  of  LRU  stack)  before  they  are 
scheduled  again.  2)  It  is  not  only  the  cold  pages,  but 
also  the  hot  pages  that  will  suffer  in  this  case;  since 
the  high  number  of  cold  pages  being  broadcast  can 
force  hot  pages  to  be  flushed  off  the  cache  before 
they  are  scheduled  for  broadcast  again. 

Fortunately  it  is  possible  to  replace  the  server 
cache  replacement  policy  with  one  that  better 
matches  the  broadcast  scheduling  algorithm  used. 
Alternatives  to  LRU  that  avoid  the  problem  of  cold 
pages  replacing  hot  pages  have  been  proposed  (e.g., 
LRU-K  [OOW93]  and  2Q  [JS94]).  These  policies 
maintain  past  reference  history  for  items  that  are  no 
longer  in  the  cache  and  use  it  to  distinguish  cold 
pages  from  hot  ones.  In  our  environment,  however, 
we  have  a  unique  advantage,  namely,  that  the  RxWal- 
gonthm  already  provides  valuable  information  that 
can  reliably  be  used  to  distinguish  hot  pages  from 
cold,  without  the  need  to  store  additional  access  his¬ 
tory.  In  the  following  subsection  we  investigate  the 
technique  that  uses  this  information. 


and  cold  pages.  More  explicitly,  RxW  broadcasts  a 
page  either  if  it  is  popular  enough  or  if  it  has  been 
waited  for  long  enough.  Hot  pages  are  more  likely 
to  be  broadcast  because  they  have  a  large  number  of 
outstanding  requests  and  hence,  will  be  high  on  the 
R-list  when  they  are  scheduled  for  broadcast.  On  the 
other  hand,  cold  pages  are  likely  to  have  accumu¬ 
lated  some  waiting  time  before  they  are  scheduled 
and  therefore  will  be  high  on  the  W-list.  Thus,  due 
to  the  data  structure  used  by  RxW  it  is  possible  to 
distinguish  between  the  popularity  of  pages  and  treat 
these  pages  accordingly.  Pages  that  appear  to  be  hot 
at  the  time  they  are  chosen  for  broadcast  are  tagged 
with  a  “love”  hint  and  placed  at  the  top  of  the  LRU 
stack,  while  those  that  appear  to  be  cold  are  tagged 
with  a  “hate”  hint  and  put  at  the  bottom  of  the  LRU 
stack  where  they  are  likely  to  be  chosen  as  replace¬ 
ment  victims.  We  refer  to  this  extension  of  LRU  as 
“LRU-LH”  or  simply  “LH”.  With  “LH”  we  expect 
the  cache  to  converge  to  a  state  where  it  only  keeps 
popular  pages. 

To  decide  if  a  page  chosen  for  broadcast  should  be 
considered  hot  (i.e.,  marked  with  love-hint)  the  page 
must  meet  the  following  tests: 

1.  In  the  scheduler’s  alternating  search  of  the  R 
and  W  lists  the  page  must  be  encountered  on  the 
R-list  before  it  is  encountered  on  the  W-list. 

2.  The  page  must  appear  in  the  top  hot  range  pages 
of  the  R-list,  and  there  must  not  be  any  pages 
with  the  same  number  of  requests  that  lie  beyond 
this  range. 

The  first  test  ensures  that  the  page  is  higher  on  the  R- 
list  than  on  the  W-list.  This  requirement  is  satisfied 
by  simply  updating  the  current  maximum  RxW  with 
a  only  a  latter  value,  excluding  equal  values.  As 
a  result  once  a  page  is  selected  on  the  R-list  it  is 
guaranteed  that  it  has  not  been  scanned  on  the  W-list 
yet  and  the  page  is  a  candidate  for  a  hot  page.  The 
second  test  aims  to  reserve  the  cache  for  the  hottest 
pages  that  can  fit.  The  special  handling  for  “ties”  is 
intended  to  avoid  over-committing  the  cache  in  the 
case  that  many  pages  have  the  same  R  value.  The 
size  of  the  hot  range  is  set  to  be  a  fraction  of  the 
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size  of  the  cache  and  the  number  of  outstanding  page 
requests  as 

entryCnt 

cacheSize  X 

where  entryCnt  is  the  total  number  of  pages  queued 
for  broadcast  at  the  time  of  scheduling  decision,  and 
dbSize  is  the  total  number  of  unique  pages  requested 
by  the  client  population  7. 

B.  Caching:  Evaluation 

In  this  section  we  investigate  possible  improve¬ 
ments  by  more  suitable  server  cache  management 
policies.  The  workload  parameters  are  as  described 
in  Section  V-B.  In  this  experiment,  we  take  the  best 
opportunistic  scheduling  algorithm  of  Section  V-B, 
namely  OS-SC  and  replace  the  cache  replacement 
policy  of  LRU  with  LH.  To  evaluate  the  benefits  of 
LH  we  compare  it  with  LRU  and  an  idealized  (i.e., 
impractical)  algorithm  called  PCACHE.  This  latter 
algorithm  uses  perfect  knowledge  of  the  data  access 
distribution  and  keeps  the  pages  with  the  highest  ac¬ 
cess  probabilities  in  cache  at  all  time.  PCACHE 
demonstrates  the  ideal  case  LH  is  aiming  at  converg¬ 
ing  to. 

We  use  the  same  workload  as  described  in  Sec¬ 
tion  V-B  for  this  experiment.  The  average  waiting 
time  for  the  three  caching  policies  is  shown  in  Fig¬ 
ure  8.  Note  that,  the  curve  labeled  as  LRU  is  the 
same  curve  labeled  as  OS-SC  in  Figure  5.  In  all 
three  approaches  OS-SC  alternative  is  being  used  for 
opportunistic  scheduling.  On  the  x-axis,  the  cache 
size  is  varied  up  to  5000  pages,  and  the  y-axis  shows 
the  average  waiting  time  for  the  algorithms.  As  can 
be  seen  in  the  figure,  LRU  provides  the  worst  per¬ 
formance  across  the  entire  range  of  cache  sizes.  LH 
performs  significantly  better  than  LRU.  For  instance, 
at  2000  page  cache,  LH  is  2.8  times  better  than  LRU. 
LH  is  almost  as  good  as  PCACHE  across  the  entire 
range.  This  suggests  that  LH  is  very  successful  in 
converging  to  the  behavior  of  keeping  the  cache  with 
the  top  most  popular  pages.  This  observation  is  fur¬ 
ther  supported  by  the  hit  rates  shown  in  Figure  9  for 
the  same  experiment.  We  observe  that  LH  improves 

7 In  general,  if  the  database  size  is  not  known  a-prion  (e.g., 
when  pages  reside  on  remote  sites),  an  estimate  by  observing  the 
pages  requested  from  the  server  can  be  used. 


the  hit  rate  almost  as  much  as  PCACHE  does.  This 
proves  the  ability  to  distinguish  between  hot  pages 
and  cold  pages  through  the  use  of  the  RxW  algo¬ 
rithm.  Due  to  the  increased  hit  rate,  opportunistic 
scheduling  is  used  less  frequently  and  the  optimal 
decision  made  by  RxW.90  is  realized  more  often. 
Going  back  to  Figure  8  we  see  all  algorithms  con¬ 
verge  for  hit  rates  over  0.5.  8  This  performance 
is  obtained  when  the  majority  of  the  missed  pages 
have  only  one  or  two  requests  pending  and  those  re¬ 
quests  have  already  accumulated  a  high  waiting  time 
(around  4  seconds).  Therefore  an  additional  disk  I/O 
latency  does  not  impact  the  overall  performance  a 
lot.  This  small  penalty  of  cache  misses  is  easily  paid 
off  by  the  support  of  opportunistic  scheduling. 

The  main  result  of  Figure  8  is  that  with  LH  a 
cache  size  of  20%  of  the  database  is  enough  for 
perfect  performance.  This  suggests  that  LH  needs 
half  the  cache  size  LRU  would  need  for  this  skewed 
workload.  This  improvement  is  especially  impor¬ 
tant  when  we  consider  large  database  sizes  being 
accessed  with  a  skewed  distribution. 

VII.  Reducing  I/O  latency 

The  final  front  of  our  attack  to  data  staging  problem 
is  to  bring  the  pages  to  the  cache  before  they  are 
actually  scheduled  for  broadcast.  The  trick  here  is 
to  be  able  to  predict  broadcast  decisions.  As  with 
caching,  we  exploit  properties  of  the  RxW  algorithm 
to  make  such  predictions. 

A.  Prefetching 

We  focus  our  prefetching  efforts  on  predicting 
which  cold  pages  are  likely  to  be  chosen  for  broad¬ 
cast.  There  are  two  reasons  behind  this  choice.  First, 
prefetching  requires  prediction  based  on  the  current 
state  of  the  service  queue.  The  top  of  the  W-list  is 
stable,  it  can  change  only  when  a  page  is  broadcast. 
In  contrast,  the  top  of  the  R-list  which  is  much  more 
volatile.  It  can  change  for  every  request  that  arrives 
at  the  server.  Thus,  the  prediction  of  pages  that  are 
likely  to  be  chosen  due  to  their  W  value  (i.e.,  cold 
pages)  is  more  reliable  than  for  hot  pages.  Second, 
the  hit  rate  on  cold  pages  is  expected  to  be  very  low, 

*Note  that  the  hit  rates  in  Figure  9  are  not  yet  at  1  since  the 
cache  can  keep  only  half  of  the  pages  in  the  database. 
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Fig.  8.  Average  Wait  Time 


Fig.  9.  Hit  Rate 


and  the  LH  caching  policy  aims  at  effectively  keep¬ 
ing  hot  pages  in  cache.  We  apply  a  division  of  labor 
between  the  two  approaches  used  in  an  integrated  de¬ 
sign.  Caching  has  the  responsibility  of  keeping  track 
of  the  pages  at  the  top  of  the  R-list,  and  prefetching 
has  the  responsibility  of  keeping  track  of  the  pages 
at  the  top  of  the  the  W-list. 

The  main  idea  of  prefetching  is  to  make  sure  that 
all  pages  within  a  defined  range  at  the  top  of  the  W-list 
are  either  already  in  the  cache  or  are  in  the  process  of 
being  prefetched.  The  size  of  the  range  is  a  parameter 
called  prfWindow.  A  buffer  of  prfWindow  pages 
is  reserved  in  the  cache  (i.e.,  taken  out  of  the  LH- 
managed  space)  and  background  threads  are  used  to 
prefetch  pages  into  this  buffer.  When  a  page  from 
the  prefetch  buffer  is  broadcast  (and  hence,  its  entry 
removed  from  the  W-list),  the  prefetch  of  a  new  page 
is  initiated.  If  a  page  is  scheduled  for  broadcast  while 
it  is  in  the  process  of  being  prefetched,  the  page  is 
broadcast  as  soon  as  it  arrives. 

B.  Prefetching:  Evaluation 

In  this  experiment  we  evaluate  the  effectiveness 
of  prefetching  using  the  workload  as  described  in 
Section  V-B.  Figure  10  shows  the  average  wait  time 
with  prefetching  (labeled  “PRF’  on  the  graph)  and 
without  (labeled  “LH”)-  For  both  cases,  Scan  Cache 
Opportunistic  Scheduling  (OS-SC)  and  LRU  with 
Love/Hate  hints  (LH)  is  being  applied.  The  differ¬ 
ence  comes  from  the  use  of  prefetching.  The  to¬ 
tal  cache  size  including  the  buffer  for  the  pages  to 
prefetch  is  varied  from  500  to  5,000  pages.  These  re¬ 
sults  were  generated  with  a  prfWindow  of  250.  This 
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value  is  chosen  according  to  our  experiments  vary¬ 
ing  the  prefetch  window  at  the  cache  size  of  2000 
pages. 

In  the  first  experiment,  we  apply  data  staging 
to  disk-resident  data.  The  limited  effectiveness  of 
prefetching  in  this  case  can  be  seen  in  Figure  10.  In 
Figure  10  we  see  that  when  prefetching  is  added  to 
LH  there  is  little  or  no  improvement  when  compared 
to  LH  alone.  This  behavior  arises,  since  the  average 
wait  for  pages  is  much  higher  than  the  time  it  takes 
to  read  a  page  from  disk  (even  if  the  disk  is  highly 
utilized).  The  cost  of  the  I/O  does  not  significantly 
contribute  to  the  time  it  takes  for  a  user  request  to 
be  satisfied,  thus  prefetching  is  not  very  helpful  in 
reducing  the  latency  of  secondary  storage.  The  two 
curves  curverge  when  the  hit  rate  of  the  server  is  al¬ 
ready  above  the  tolerable  limit  of  0.5  as  described  in 
Section  VI-B,  i.e.,  when  additional  disk  I/O  does  not 
impact  the  waiting  time  significantly. 

In  Figure  1 1 ,  we  plot  the  results  for  higher  latencies 
for  page  retrieval.  In  this  case  we  use  a  cache  size  of 
2000  pages  and  we  gradually  increase  the  latency  of 
retrieving  pages.  As  the  latency  increases  we  see  a 
great  advantage  coming  from  prefetching.  At  around 
1  second,  PRF  results  95%  improvement  in  terms  of 
the  waiting  time  compared  to  that  would  be  achieved 
when  prefetching  is  not  used.  The  performance  boost 
comes  from  the  improved  scheduling  quality  when 
prefetching  is  used.  We  observe  the  bandwidth  allo¬ 
cation  error  curves  for  the  same  experiment  increase 
in  a  similar  behaviour  as  shown  for  average  waiting 
time.  The  two  curves  converge  at  latencies  around 
25  seconds,  where  latency  becomes  the  dominating 


Fig.  10.  Average  Wait  (Disk  Resident  Data) 


Fig.  1 1 .  Average  Wait  varying  Latency 


factor. 

VIII.  Related  Work 

As  stated  previously,  there  has  been  much  work 
on  developing  scheduling  algorithms  for  on  demand 
data  broadcasting  [DAW86],  [DW88],  [Won88], 
[VH96],  [ST97b],  [AF98]  and  that  all  of  this  work 
with  the  exception  of  Dykeman  et  al.  [DW88]  ig¬ 
nored  the  issue  of  data  staging.  The  Dykeman  et 
al.  work  was  performed  assuming  an  environment 
that  differed  substantially  from  the  one  studied  here. 
First,  the  context  of  that  work  was  Teletext  sys¬ 
tems,  which  had  much  lower  bandwidths  and  hence, 
much  smaller  databases  to  broadcast.  Thus,  the  so¬ 
lutions  in  that  earlier  paper  used  a  very  expensive 
scheduling  and  cache  replacement  algorithm  that 
would  not  scale  for  the  large-scale  systems  with  very 
large  databases.  Second,  some  of  the  approaches 
in  [DW88]  were  based  on  assumptions  about  detailed 
control  over  disk  devices,  that  are  not  applicable  to 
today’s  commodity  disk  drives  and  controllers,  and 
do  not  address  the  problem  of  data  residing  on  re¬ 
mote  sites.  Finally,  the  Dykeman  et  al.  study  was 
done  using  simulation  so  much  of  the  overhead  and 
contention  that  arises  in  a  real  system  was  not  consid¬ 
ered.  Despite  these  differences,  this  study  provides  a 
powerful  insight  to  pioneer  data  staging  problem  and 
our  results  have  some  common  conclusions:  speed¬ 
ing  up  the  rate  at  which  requested  pages  are  retrieved 
is  of  top  priority  for  performance  even  if  this  is  at  the 
expense  of  not  being  able  to  retrieve  the  page  with 
the  highest  priority  immediately. 

The  result  that  the  optimal  broadcast  bandwidth 


allocation  is  in  proportion  to  the  ratios  of  the  square 
roots  of  the  page  access  probabilities  was  shown 
in  [DAW86]  and  [AW87].  Recall  that  this  prop¬ 
erty  of  broadcast  plays  a  key  role  in  our  result  that 
low  overhead  Opportunistic  Scheduling  approaches 
can  have  a  poor  performance  due  to  poor  bandwidth 
utilization. 

In  the  more  general  context,  data  staging  has  also 
been  studied  for  multimedia  systems.  Ozden  et 
al.  [ORS96]  studied  buffer  replacement  algorithms 
for  multimedia  storage  systems  that  exploit  the  large 
file  sizes  and  sequential  access  found  in  many  mul¬ 
timedia  applications.  Aggarwal  et  al.  [AWY96] 
have  studied  scheduling  algorithms  for  Video-On- 
Demand  systems,  and  proposed  a  heuristic  that  uses 
the  number  of  outstanding  requests  per  page  and  the 
broadcast  history  followed  up  per  page.  This  study 
also  ignores  the  data  staging  problem.  In  [FD95], 
prefetching  was  shown  to  be  an  effective  perfor¬ 
mance  enhancer  for  video-on-demand  systems.  The 
success  of  prefetching  in  this  study,  however,  is 
mainly  based  on  the  sequential  access  of  video  files. 
Prefetching  has  been  used  together  with  caching 
to  reduce  access  latencies  in  many  other  contexts. 
For  example,  Patterson  et  al.’s  informed  prefetching 
study  [PGG+95]  has  shown  that  prefetching  using 
hints  from  applications  is  an  effective  way  of  ex¬ 
ploiting  10  concurrency.  As  stated  previously,  while 
prefetching  is  also  helpful  in  the  on-demand  broad¬ 
cast  setting  especially  for  high  latencies,  the  need 
for  it  is  obviated  for  secondary  storage  latencies  by 
the  ability  to  do  background  “post-fetching”  with  lit¬ 
tle  penalty  in  additional  path  length  for  page  access 
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and  little  degradation  of  the  quality  of  the  broadcast 
schedule. 

IX.  Conclusion 

We  have  presented  a  data  delivery  scheme  that  is 
especially  good  for  large  client  populations  with  a 
high  overlap  in  interest.  In  this  scheme,  the  server 
receives  individual  pull  requests  from  clients  and 
broadcasts  the  results.  We  have  shown  that  a  real  im¬ 
plementation  with  a  large  database  and  a  fast  broad¬ 
cast  channel  must  take  data  staging  concerns  seri¬ 
ously  in  order  to  achieve  reasonable  performance. 

Our  basic  approach  integrates  broadcast  schedul¬ 
ing  and  data  staging  in  a  novel  way.  We  use  love/hate 
hints  derived  from  the  scheduling  data  structures  to 
guide  the  caching  of  popular  pages.  Beyond  that,  we 
have  shown  that,  while  it  is  tempting  to  use  prefetch¬ 
ing  to  improve  the  performance  of  the  cache,  it  is 
more  effective  to  use  a  “post-fetching”  technique 
that  we  call  opportunistic  scheduling,  for  secondary 
storage  latencies.  The  best  results  ,  for  secondary 
storage  latencies  occur  when  we  simply  keep  the 
broadcast  filled  with  decent  items  instead  of  worry¬ 
ing  about  sending  the  best  items  which  might  be  hard 
to  obtain  because  of  staging  problems.  For  higher 
latencies,  however,  prefetching  proved  to  be  very 
effective. 

We  have  implemented  a  prototype  system  on  Win¬ 
dows  NT  4.0  to  show  that  the  data  staging  algorithms 
that  we  propose  dramatically  reduce  the  penalty  of 
data  retrieval  latency. 

In  the  future,  we  plan  to  focus  more  on  wide- 
area  systems  in  which  the  data  items  of  interest  may 
be  located  on  other  machines  which,  like  a  disk, 
introduce  additional  latency.  We  will  also  look  at 
problems  introduced  by  variable  length  date  items 
and  by  stream-oriented  data  with  ordering  constraints 
(e.g.,  video). 
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1  Introduction 

The  proliferation  of  the  Internet  and  intranets,  the  development  of  wireless  and 
satellite  networks,  and  the  availability  of  asymmetric,  high-bandwidth  links  to 
the  home,  have  fueled  the  development  of  a  wide  range  of  new  “dissemmation- 
based’’  applications.  These  applications  involve  the  timely  distribution  of  data 
to  a  large  set  of  consumers,  and  include  stock  and  sports  tickers,  traffic  infor¬ 
mation  systems,  electronic  personalized  newspapers,  and  entertainment  delivery. 
Dissemination-oriented  applications  have  special  characteristics  that  render  tra¬ 
ditional  client-server  data  management  approaches  ineffective.  These  include: 

-  tremendous  scale. 

-  a  high-degree  of  overlap  in  user  data  needs. 

-  asymmetric  data  flow  from  sources  to  consumers. 

For  example,  consider  a  dissemination-oriented  application  such  as  an  elec¬ 
tion  result  server.  Typically,  such  applications  are  implemented  by  simply  posting 
information  and  updates  on  a  World  Wide  Web  server.  Such  servers,  however, 
can  and  often  do  become  overloaded,  resulting  in  the  inability  for  users  to  access 
the  information  in  a  timely  fashion.  We  argue  that  such  scalability  problems  are 
the  result  of  a  mismatch  between  the  data  access  characteristics  of  the  applica¬ 
tion  and  the  technology  (in  this  case,  HTTP)  used  to  implement  the  applica¬ 
tion.  HTTP  is  based  on  a  request-response  or  RPC,  unicast  (i.e.,  point-to-point) 
method  of  data  delivery,  which  is  simply  the  wrong  approach  for  this  type  of 
application. 

Using  request-response,  each  user  sends  requests  for  data  to  the  server.  The 
large  audience  for  a  popular  event  can  generate  huge  spikes  in  the  load  at  servers, 
resulting  in  long  delays  and  server  crashes.  Compounding  the  situation  is  that 
users  must  continually  poll  the  server  to  obtain  the  most  current  data,  resulting 
in  multiple  requests  for  the  same  data  items  from  each  user.  In  an  application 
such  as  an  election  server,  where  the  interests  of  a  large  part  of  the  population 
are  known  a  priori ,  most  of  these  requests  are  unnecessary. 

The  use  of  unicast  data  delivery  likewise  causes  problems  in  the  opposite 
direction  (from  servers  to  clients).  With  unicast  the  server  is  required  to  respond 
individually  to  each  request,  often  transmitting  identical  data.  For  an  application 
with  many  users,  the  costs  of  this  repetition  in  terms  of  network  bandwidth  and 
server  cycles  can  be  devastating. 
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To  address  the  particular  needs  of  dissemination-based  applications,  we  are 
developing  a  general  framework  for  describing  and  constructing  Dissemination- 
Based  Information  Systems  (DBIS).  The  framework  incorporates  a  number  of 
data  delivery  mechanisms  and  an  architecture  for  deploying  them  in  a  networked 
environment.  The  goal  is  to  support  a  wide  range  of  applications  across  many 
varied  environments,  such  as  mobile  networks,  satellite-based  systems,  and  wide- 
area  networks.  By  combining  the  various  data  delivery  techniques  in  a  way  that 
matches  the  characteristics  of  the  application  and  achieves  the  most  efficient 
use  of  the  available  server  and  communication  resources,  the  scalability  and 
performance  of  dissemination-oriented  applications  can  be  greatly  enhanced. 

In  this  paper,  we  provide  an  overview  of  the  current  status  of  our  DBIS 
research  efforts.  We  first  explain  the  framework  and  then  describe  our  initial 
prototype  of  a  DBIS  toolkit.  We  then  focus  on  several  research  results  that  have 
arisen  from  this  effort. 


2  The  DBIS  Framework 

There  are  two  major  aspects  of  the  DBIS  framework.3  First,  the  framework 
incorporates  a  number  of  different  options  for  data  delivery.  A  taxonomy  of 
these  options  is  presented  in  Section  2.1  and  the  methods  are  further  discussed  in 
Section  2.2.  Secondly,  the  framework  exploits  the  notion  of  network  transparency, 
which  allows  data  delivery  mechanisms  to  be  mixed-and-matched  within  a  single 
application.  This  latter  aspect  of  the  framework  is  described  in  Section  2.3. 

2.1  Options  for  Data  Delivery 

We  identify  three  main  characteristics  that  can  be  used  to  describe  data  delivery 
mechanisms:  (1)  push  vs.  pull;  (2)  periodic  vs.  aperiodic;  and  (3)  unicast  vs.  1- 
to-N.  Figure  1  shows  these  characteristics  and  how  several  common  mechanisms 
relate  to  them. 

Client  Pull  vs.  Server  Push  -  The  first  distinction  we  make  among  data 
delivery  styles  is  that  of  “pull  vs.  push”.  Current  database  servers  and  object 
repositories  support  clients  that  explicitly  send  requests  for  data  items  when 
they  require  them.  When  a  request  is  received  at  a  server,  the  server  locates 
the  information  of  interest  and  returns  it  to  the  client.  This  request-response 
style  of  operation  is  pull-based  —  the  transfer  of  information  from  servers  to 
clients  is  initiated  by  a  client  pull.  In  contrast,  push-based  data  delivery  involves 
sending  information  to  a  client  population  in  advance  of  any  specific  request. 
With  push-based  delivery,  the  server  initiates  the  transfer. 

The  tradeoffs  between  push  and  pull  revolve  around  the  costs  of  initiating 
the  transfer  of  data.  A  pull-based  approach  requires  the  use  of  a  backchannel  for 

3  Parts  of  this  section  have  been  adapted  from  an  earlier  paper,  which  appeared  in  the 
1997  ACM  OOPSLA  Conference  [Fran97], 
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Fig.  1.  Data  Delivery  Characteristics 


each  request.  Furthermore,  as  described  in  the  Introduction,  the  server  must  be 
interrupted  continuously  to  deal  with  such  requests  and  has  limited  flexibility 
in  scheduling  the  order  of  data  delivery.  Also,  the  information  that  clients  can 
obtain  from  a  server  is  limited  to  that  which  the  clients  know  to  ask  for.  Ihus, 
new  data  items  or  updates  to  existing  data  items  may  go  unnoticed  at  clients 
unless  those  clients  periodically  poll  the  server. 

Push-based  approaches,  in  contrast,  avoid  the  issues  identified  for  client-pull, 
but  have  the  problem  of  deciding  which  data  to  send  to  clients  in  the  absence  of 
specific  requests.  Clearlv,  sending  irrelevant  data  to  clients  is  a  waste  of  resources. 
A  more  serious  problem,  however,  is  that  in  the  absence  of  requests  it  is  possible 
that  the  servers  will  not  deliver  the  specific  data  that  are  needed  by  clients  in 
a  timelv  fashion  (if  ever).  Thus,  the  usefulness  of  server  push  is  dependent  on 
the  ability  of  a  server  to  accurately  predict  the  needs  of  clients.  One  solution  to 
this  problem  is  to  allow  the  clients  to  provide  a  profile  of  their  interests  to  the 
servers.  Pubhsh/subscribe  protocols  are  one  popular  mechanism  for  providing 
such  profiles. 


Aperiodic  vs.  Periodic  -  Both  push  and  pull  can  be  performed  in  either  an 
aperiodic  or  periodic  fashion.  Aperiodic  delivery  is  cvent-dnven  —  a  data  request 
(for  pull)  or  transmission  (for  push)  is  triggered  by  an  event  such  as  a  user  action 
(for  pull)  or  data  update  (for  push).  In  contrast,  periodic  delivery  is  performed 
according  to  some  pre-arranged  schedule.  This  schedule  may  be  fixed,  or  may 
be  generated  with  some  degree  of'randomness.4  An  application  that  sends  out 
stock  prices  on  a  regular  basis  is  an  example  of  periodic  push,  whereas  one  that 
sends  out  stock  prices  only  when  they  change  is  an  example  of  aperiodic  push. 


1  For  the  purposes  of  this  discussion,  we  do  not  distinguish  between  fixed  and  random¬ 
ized  schedules.  Such  a  distinction  is  important  in  certain  applications.  For  example 
algorithms  for  conserving  energy  in  mobile  environments  proposed  by  Imielinski  et 
al  [Imie94b]  depend  on  a  strict  schedule  to  allow  mobile  clients  to  doze  during 
periods  when  no  data  of  interest  to  them  will  be  broadcast. 


Unicast  vs.  1-to-N  -  The  third  characteristic  of  data  delivery  mechanisms  we 
identify  is  whether  they  are  based  on  unicast  or  1-to-N  communication.  W  ith 
unicast  communication,  data  items  are  sent  from  a  data  source  ( e.g. ,  a  sin¬ 
gle  server)  to  one  other  machine,  while  1-to-N  communication  allows  multiple 
machines  to  receive  the  data  sent  by  a  data  source.  Two  types  of  1-to-N  data 
delivery  can  be  distinguished:  multicast  and  broadcast.  With  multicast,  data  is 
sent  to  a  specific  subset  of  clients.  In  some  systems  multicast  is  implemented  by 
sending  a  message  to  a  router  that  maintains  the  list  of  recipients.  The  router 
reroutes  the  message  to  each  member  of  the  list.  Since  the  list  of  recipients  is 
known,  it  is  possible  to  make  multicast  reliable;  that,  is,  network  protocols  can  be 
developed  that  guarantee  the  eventual  delivery  of  the  message  to  all  clients  that 
should  receive  it.  In  contrast,  broadcasting  sends  information  over  a  medium  on 
which  an  unidentified  and  unbounded  set  of  clients  can  listen.  This  differs  from 
multicast  in  that  the  clients  who  may  receive  the  data  are  not  known  a  prion. 

The  tradeoffs  between  these  approaches  depend  upon  the  commonality  of 
interest  of  the  clients.  Using  broadcast  or  multicast,  scalability  can  be  improved 
bv  allowing  multiple  clients  to  receive  data  sent  using  a  single  server  message. 
Such  benefits  can  be  obtained,  however,  only  if  multiple  clients  are  interested  in 
the  same  items.  If  not.  then  scalability  may  actually  be  harmed,  as  clients  may 
be  continually  interrupted  to  filter  data  that  is  not  of  interest  to  them. 

2.2  Classification  of  Delivery  Mechanisms 

It  is  possible  to  classify  many  existing  data  delivery  mechanisms  using  the  char¬ 
acteristics  described  above.  Such  a  classification  is  shown  in  Figure  1.  We  discuss 
several  of  the  mechanisms  below. 

Aperiodic  Pull  -  Traditional  request/response  mechanisms  use  aperiodic 
pull  over  a  unicast  connection.  If  instead,  a  1-to-N  connection  is  used,  then 
clients  can  “snoop”  on  the  requests  made  by  other  clients,  and  obtain  data  that 
they  haven’t  explicitly  asked  for  (e.g,  see  [Acha97,  Akso98]). 

Periodic  Pull  -  In  some  applications,  such  as  remote  sensing,  a  system  may 
periodically  send  requests  to  other  sites  to  obtain  status  information  or  to  detect 
changed  values.  If  the  information  is  returned  over  a  1-to-N  link,  then  as  with 
request/response,  other  clients  can  snoop  to  obtain  data  items  as  they  go  by. 
Most  existing  Web  or  Internet-based  “push”  systems  are  actually  implemented 
using  Periodic  Pull  between  the  client  machines  and  the  data  source(s)  [Fran98]. 

Aperiodic  Push  -  Publish/subscribe  protocols  are  becoming  a  popular 
way  to  disseminate  information  in  a  network  [Oki93,  Yan95,  Glan96].  In  a  pub¬ 
lish/subscribe  system,  users  provide  information  (sometimes  in  the  form  of  a  pro¬ 
file)  indicating  the  types  of  information  they  wish  to  receive.  Publish/subscribe 
is  push-based;  data  flow  is  initiated  by  the  data  sources,  and  is  aperiodic,  as  there 
is  no  predefined  schedule  for  sending  data.  Publish/subscribe  protocols  are  in¬ 
herently  1-to-N  in  nature,  but  due  to  limitations  in  current  Internet  technology, 
they  are  often  implemented  using  individual  unicast  messages  to  multiple  clients. 
Examples  of  such  systems  include  Internet  e-mail  lists  and  some  existing  “push” 
systems  on  the  Internet.  True  1-to-N  delivery  is  possible  through  technologies 
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such  as  IP-Multicast,  but  such  solutions  are  not  universally  available  across  the 

Internet.  ... 

Periodic  Push  -  Periodic  push  has  been  used  for  data  dissemination  in  many 
svstems  An  example  of  Periodic  Push  using  unicast  is  Internet  mailing  lists  that 
send  out  “digests"  on  a  regular  schedule.  For  example,  the  Majordomo  system 
allows  a  list  manager  to  set  up  a  schedule  (e.g.,  weekly)  for  sending  diges  s. 
Such  digests  allow  users  to  follow  a  mailing  list  without  being  continually  inter¬ 
rupted  by  individual  messages.  There  have  also  been  many  systems  that  use  Pe¬ 
riodic  Push  over  a  broadcast  or  multicast  link.  These  include  TeleText  [Anima85 
WongSS].  DataCvde  [Herm87],  Broadcast  Disks  [Acha95a,  Acha95b]  and  mobile 

databases  [Imie94b]. 


2.3  Network  Transparency 

The  previous  discussion  has  focused  primarily  on  different  modes  of  data  delivery. 
The  second  aspect  of  the  DBIS  framework  addresses  how  those  delivery  modes 
are  used  to  facilitate  the  efficient  transfer  of  data  through  the  nodes  of  a  DBIS 
network.  The  DBIS  framework  defines  three  types  of  nodes: 


1.  Data  Sources ,  which  provide  the  base  data  to  be  disseminated. 

2.  Clients,  which  are  net.  consumers  of  information.  .  . 

3  Information  Brokers ,  (or  agents,  mediators,  etc.),  which  acquire  information 
from  other  sources,  possibly  add  value  to  that  information  (e.g.,  some  ad¬ 
ditional  computation  or  organizational  structure),  and  then  distribute  this 
information  to  other  consumers. 


Brokers  are  the  glue  that  bind  the  DBIS  together.  Brokers  are  middlemen; 
a  broker  acts  as  a  client  to  some  number  of  data  sources,  collects  and  possibly 
repackages  the  data  it  obtains,  and  then  functions  as  a  data  source  to  other 
nodes  of  the  system.  By  creating  hierarchies  of  brokers,  information  delivery  can 

be  tailored  to  the  needs  of  many  different  users. 

The  ability  of  brokers  to  function  as  both  clients  and  data  sources  provides  the 
basis  for  the  notion  of  Network  Transparency.  Receivers  of  information  cannot 
detect  the  details  of  interconnections  any  further  upstream  than  their  immediate 
predecessor.  Because  of  this  transparency,  the  data  delivery  mechanism  used 
between  two  or  more  nodes  can  be  changed  without  requiring  changes  to  the  data 
delivery  mechanisms  used  for  other  communication  in  the  DBIS.  For  example, 
suppose  that  node  B  is  pulling  data  values  from  node  A  on  demand.  Further, 
suppose  that  node  C  is  listening  to  a  periodic  broadcast  from  node  B  which 
includes  values  that  B  has  pulled  from  A.  Node  C  will  not  have  to  change  it  s 
data  gathering  strategy  if  A  begins  to  push  values  to  B  Changes  in  links  are  of 
interest  only  to  the  nodes  that  are  directly  involved.  Likewise  this  transparency 
allows  the  “appearance”  of  the  data  delivery  at  any  node  to  differ  from  the  way 
the  data  is  actually  delivered  earlier  in  the  network.  This  in  turn,  allows  the 
data  delivery  mechanisms  to  be  tailored  for  a  given  set  of  nodes.  For  example,  a 
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broker  that  typically  is  very  heavily  loaded  with  requests  could  be  an  excellent 
candidate  for  a  push-based  delivery  mechanism  to  its  clients. 

Current  Internet  “push”  technology,  such  as  that  provided  by  PointCast  [Rama98] 
provide  an  excellent  example  of  network  transparency  in  action.  To  the  user  sit¬ 
ting  at  the  screen,  the  system  gives  the  impression  of  using  aperiodic  push  over 
a  broadcast  channel.  Due  to  current  limitations  of  the  Internet,  however,  that 
data  is  actually  brought  over  to  the  client  machine  using  a  stream  of  periodic 
pull  requests,  delivered  in  a  unicast  fashion.  Thus,  the  data  delivery  between 
the  client  and  the  PointCast  server  is  actually  the  exact  opposite  of  the  view 
that  is  presented  to  the  user  tn  all  three  dimensions  of  the  hierarchy  of  Figure  1. 
This  situation  is  not  unique  to  PointCast;  in  fact,  it  is  true  for  virtually  all  of 
the  Internet-based  push  solutions,  and  stems  from  the  fact  that  current  IP  and 
HTTP  protocols  do  not  adequately  support  push  or  1-to-N  communication. 
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Fig.  2.  The  Map  Dissemination  Application 


3  An  Initial  Prototype 

As  stated  in  the  introduction,  our  ultimate  goal  is  to  build  a  toolkit  of  com¬ 
ponents  that  can  be  used  to  create  a  DBIS  tailored  to  support  a  particular  set 
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of  dissemination-based  applications.  In  order  to  better  understand  the  require¬ 
ments  and  desired  properties  of  such  a  toolkit,  we  have  constructed  an  initial 
prototype  toolkit  and  have  used  it  to  implement  a  weather  map  dissemination 

application.  .  ..  . 

Figure  2  shows  an  example  screen  from  this  application.  In  this  application 

one  or  more  “map  servers”  sends  out  updated  maps  of  different  types  (i.e., 
radar,  satellite  image,  etc.)  for  different  regions  of  the  United  States.  Clients 
can  subscribe  to  updates  for  specific  types  of  maps  for  specific  regions.  T  ey 
can  also  pose  queries  to  obtain  the  most  recent  versions  of  specific  maps.  The 
DBIS  components  route  such  queries  to  the  appropriate  server(s).  In  the  current 
prototype,  all  maps  are  multicast  to  all  clients  —  the  clients  perform  additional 
filtering  to  avoid  displaying  unrequested  results  to  the  user.  In  the  remainder  o 
this  section,  we  briefly  describe  the  implementation  of  the  prototype  toolkit. 


3.1  Toolkit  Description 

Figure  3  shows  an  example  instantiation  of  a  DBIS  using  the  current  toolkib 
The  toolkit  consists  of  four  main  components.  These  are  shown  as  lightly-shaded 
items  in  the  figure.  The  darker  shaded  items  are  software  that  is  not  part  of  the 
DBIS  toolkit,  namely,  the  data  sources  and  clients  themselves.  The  components 
of  the  current  prototype  are: 

1.  Data  Source  (DS)  Library  -  a  wrapper  for  data  sources  that  encapsulates 
network  communication  and  provides  conversion  functions  for  data. 

2.  Client  Library  -  a  wrapper  for  client  programs  that  encapsulates  network 
communication  and  provides  conversion  functions  for  queries  and  user  pro¬ 
files.  The  client  library  is  also  responsible  for  monitoring  broadcast  and  mul¬ 
ticast  channels  and  filtering  out  the  data  items  of  local  interest  that  appear 

on  those  channels.  , 

3.  Information  Broker  (IB)  -  the  main  component  of  the  DBIS  toolkit,  lhe 
IB  contains  communication,  buffering,  scheduling,  and  catalog  management 
components  and  is  described  in  more  detail  below. 

4.  Information  Broker  Master  -  The  IB  Master  is  responsible  for  managing 
global  catalog  information  about  data  and  about  the  topology  of  the  DBIS. 
All  IBs  must  register  with  the  IB  Master  and  all  catalog  updates  must  be 
sent  to  the  IB  Master.  The  presence  of  the  IB  Master  is  one  of  the  major 
limitations  of  this  initial  prototype,  as  it  is  obviously  a  potential  scalability 
bottleneck  for  the  system.  A  large  part  of  the  design  effort  for  the  next 
version  of  the  prototype  is  aimed  at  distributing  the  functions  of  the  IB 

Master. 


3.2  Data  Modeling  Considerations 

The  DBIS  prototype  currently  uses  a  simple  data  model:  the  catalog  consists 
of  a  set  of  category  definitions.  Categories  are  application-specific,  that  is,  each 
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Fig.  3.  An  Instantiation  of  a  DB1S 


application  provides  its  own  set  of  category  definitions.  Each  data  item  is  asso¬ 
ciated  with  a  single  category.  In  addition,  a  set  of  keywords  can  be  associated 
with  each  data  item.  Categories  and  keywords  are  used  in  the  specification  of 
queries  and  profiles.  Queries  are  pull  requests  that  are  transmitted  from  a  client 
to  a  data  source.  Queries  consist  of  a  category  and  optional  keywords.  Queries 
are  processed  at  a  data  source  (or  an  IB);  all  data  items  that  match  the  cate¬ 
gory  (and  at  least  one  of  the  keywords  if  specified)  are  sent  to  the  client  from 
which  the  query  originated.  In  contrast,  profiles  are  used  to  support  push-based 
delivery.  When  a  new  data  item  arrives  at  an  IB,  its  category  and  keywords  are 
compared  with  the  user  profiles  registered  at  that  IB  and  the  item  is  sent  to 
any  clients  whose  profile  indicates  an  interest  in  the  item.  Thus,  profiles  can  be 
viewed  as  a  form  of  continually  executing  query. 

Clearly,  this  simple  approach  to  data  modeling  must  be  extended  to  sup¬ 
port  more  sophisticated  applications.  We  are  currently  exploring  database  and 
WWW-based  (e.g.,  XML)  approaches  for  semi-structured  data  modeling  for  use 
in  subsequent  versions  of  the  toolkit. 


3.3  Information  Broker  Architecture 

As  stated  above,  the  Information  Broker  module  contains  most  of  the  function¬ 
ality  of  the  DBIS  toolkit.  The  architecture  of  an  IB  is  illustrated  in  Figure  4. 
Basic  components  of  the  IB  are  the  following: 
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Fig.  4.  Information  Broker  (IB)  Architecture 


Catalog  Manager  -  This  component  manages  local  copies  of  catalog  in¬ 
formation  for  use  by  the  processes  running  at  the  broker.  Recall  that  the 
primary  copy  of  the  catalog  is  managed  by  the  IB  Master.  All  requested 
changes  to  the  catalog  information  are  sent  to  the  IB  Master,  which  then 
propagates  them  to  the  catalog  managers  of  all  other  IBs. 

Data  Source  Manager  -  This  component  is  in  charge  of  receiving  and 
filtering  data  items  obtained  from  the  data  sources.  It  manages  a  separate 
listener  thread  for  each  data  source  directly  connected  to  the  IB. 

Broker  Agent  -  This  component  is  responsible  for  IB-to-IB  interaction, 
that  is,  when  an  IB  receives  data  from  another  IB  rather  than  directly  from 
a  data  source. 
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-  Broadcast  Manager  -  Once  data  has  been  filtered  through  the  data  source 
manager  or  the  broker  agent,  it  is  passed  to  the  Broadcast  Manager,  which 
has  two  main  components.  The  Mapper  assigns  the  data  item  to  one  or  more 
phvsical  communication  channels.  The  Scheduler  makes  decisions  about  the 
order  in  which  data  items  should  be  placed  on  those  channels. 

-  Network  Manager  -  This  is  the  lowest  level  of  the  communication  com¬ 
ponent  of  the  IB.  It  sends  data  packets  to  the  network  according  to  the 
information  provided  by  the  broadcast  manager. 

-  Client  Manager  -  This  module  handles  all  requests  that  arrive  from  the 
clients  of  the  IB.  It  forwards  these  requests  to  the  proper  modules  within 
the  IB  and  maintains  communication  sessions  with  the  clients. 


4  Example  Research  Topics 

Having  described  our  general  approach  to  building  Dissemination-Based  Infor¬ 
mation  Systems,  we  now  focus  on  two  examples  of  the  many  research  issues  that 
arise  in  the  development  of  such  systems. 


Fig.  5.  Example  Data  Broadcasting  Scenario 


4.1  Topic  1:  On  Demand  Broadcast  Scheduling 

As  described  in  Section  2.1,  one  of  the  many  possible  mechanisms  for  data  dis¬ 
semination  uses  on-demand  (i.e.,  aperiodic  pull)  broadcast  of  data.  An  example 
scenario  using  such  data  delivery  is  shown  in  Figure  5.  In  this  scenario,  two  inde¬ 
pendent  networks  are  used:  a  terrestrial  network  for  sending  pull  requests  to  the 
server,  and  a  “listen  only”  satellite  downlink  over  which  the  server  broadcasts 
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data  to  all  of  the  clients.  When  a  client  needs  a  data  item  (e.g.,  a  web  page  or 
database  object)  that  it  cannot  find  locally,  it  sends  a  request  for  the  item  to  the 
server.  Client  requests  are  queued  up  (if  necessary)  at  the  server  upon  arrival. 
The  server  repeatedly  chooses  an  item  from  among  these  requests,  broadcasts 
it  over  the  satellite  link,  and  removes  the  associated  request(s)  from  the  queue. 
Clients  monitor  the  broadcast  and  receive  the  item(s)  that  they  require^ 

In  a  large-scale  implementation  of  such  a  system,  an  important  consideration 
is  the  scheduling  algorithm  that  the  server  uses  to  choose  which  request  to  ser¬ 
vice  from  its  queue  of  waiting  requests.  We  have  developed  a  novel  on-demand 
broadcast  scheduling  algorithm,  called  RxW  [Akso98],  which  is  a  practical,  low- 
overhead  and  scalable  approach  that  provides  excellent  performance  across  a 

range  of  scenarios. 

The  intuition  behind  the  RxW  scheduling  algorithm  is  to  provide  a  balanced 
performance  for  hot  (popular)  and  cold  (not  so  popular)  pages.  This  intuition  is 
based  on  our  observations  of  previously  proposed  algorithms.  We  have  observed 
that  two  low  overhead  algorithms,  Most  Requests  First  (MRF)  and  First  Come 
First  Served  (FCFS)  [Dvke86,  Wong88],  have  poor  average  case  performance 
because  thev  favor  the  broadcasting  of  hot  or  cold  pages  respectively.  A  third 
algorithm.  Longest  Wait  First  (LWF)  [Dyke86,  Wong88]  was  shown  to  provide 
fairer  treatment  of  hot  and  cold  pages,  and  therefore,  good  average  case  perfor¬ 
mance.  LWF.  however,  suffers  from  high  overhead,  making  it  impractical  for  a 

^^BateTon  these  observations,  we  set  out  to  combine  the  two  low-overhead 
approaches  (MRF  and  FCFS)  in  a  way  that  would  balance  their  strengths  and 
weaknesses.  The  RxW  algorithm  schedules  the  page  with  the  maximal  RxW 
value  where  R  is  the  number  of  outstanding  requests  for  that  page  and  W  is  the 
amount  time  that  the  oldest  of  those  requests  has  been  waiting  for  the  page. 
Thus,  RxW  schedules  a  page  either  because  has  many  outstanding  requests  or 
because  there  is  at  least  one  request  that  has  waited  for  a  long  time. 

The  algorithm  works  by  maintaining  two  sorted  lists  (one  ordered  by  R  values 
and  the  other  ordered  by  W  values)  threaded  through  the  service  queue,  which 
has  a  single  entry  for  any  requested  page  of  the  database.  Maintaining  these 
sorted  lists  is  fairly  inexpensive  since  they  only  need  to  be  updated  when  a 
new  request  arrives  at  the  server5.  These  two  sorted  lists  are  used  by  a  pruning 
technique  in  order  to  avoid  an  exhaustive  search  of  the  service  queue  to  find  the 
ma'-ima!  RxW  value.  This  technique  is  depicted  in  Figure  6 

The  search  starts  with  the  pages  at  the  top  of  the  R  list.  The  corr^ponding 
W  value  for  that  page  is  then  used  to  compute  a  limit  for  possible  W  values. 
That  is,  after  reading  the  top  page  in  the  R  list,  it  is  known  that  the  maximum 
RxW- valued  page  cannot  have  a  W  value  below  this  limit.  Next,  the  entry  or 
the  page  at  the  top  of  the  W  list  is  accessed  and  used  to  place  a  limit  on  the 
R  value  The  algorithm  alternates  between  the  two  queues  and  stops  when  the 
limit  is  reached  on  one  of  them.  This  technique  prunes  the  search  space  while 

5  In  contrast,  for  LWF  the  ordering  can  change  over  time,  even  in  the  absence  of  new 

requests. 
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Fig.  6.  Pruning  the  Search  Space 


still  guaranteeing  that  the  search  will  return  the  page  with  the  maximum  RxW 
value. 

In  our  experiments  [Akso98],  the  pruning  technique  was  shown  to  indeed 
be  effective  -  reducing  the  number  of  entries  searched  by  72%.  While  such  a 
substantial  savings  is  helpful,  it  is  probably  not  sufficient  to  keep  the  scheduling 
overhead  from  ultimately  becoming  a  limiting  factor  as  the  system  is  scaled  to  the 
huge  applications  that  will  be  enabled  by  the  national  and  global  broadcasting 
systems  currently  being  deployed. 

In  order  to  achieve  even  greater  reductions  in  the  search  space  we  developed 
an  approximation-based  version  of  the  algorithm.  By  varying  a  single  parameter 
a.  this  algorithm  can  be  tuned  from  having  the  same  behavior  as  the  RxW 
algorithm  described  so  far,  to  being  a  constant  time  approach.  The  approximate 
algorithm  selects  the  first  page  it  encounters  w’hose  RxW  value  is  greater  than 
or  equal  to  a  x  threshold,  where  threshold  is  the  running  average  of  the  RxW 
value  of  the  last  page  that  was  broadcast  and  the  threshold  at  that  time. 

The  setting  of  a  determines  the  performance  tradeoffs  between  average  wait¬ 
ing  time,  worst  case  waiting  time,  and  scheduling  overhead.  The  smaller  the 
value  of  the  parameter,  the  fewer  entries  are  likely  to  be  scanned.  At  an  extreme 
value  of  0,  the  algorithm  simply  compares  the  top  entry  from  both  the  R  list 
and  the  W  list  and  chooses  the  one  with  the  highest  RxW  value.  In  this  case, 
the  complexity  of  making  a  scheduling  decision  is  reduced  to  0(1),  ensuring  that 
broadcast  scheduling  will  not  become  a  bottleneck  regardless  of  the  broadcast 
bandwidth,  database  size,  or  workload  intensity.  We  demonstrated  the  perfor¬ 
mance,  scalability,  and  robustness  of  the  different  RxW  variants  through  an 
extensive  set  of  performance  experiments  described  in  [Akso98]. 


4.2  Topic  2:  Learning  User  Profiles 

User  profiles,  which  encode  the  data  needs  and  interests  of  users,  are  key  com¬ 
ponents  of  push-based  systems.  From  the  user’s  viewpoint,  a  profile  provides 
a  means  of  passively  retrieving  relevant  information.  A  user  can  submit  a  pro¬ 
file  to  a  push-based  system  once,  and  then  continuously  receive  data  that  are 
(supposedly)  relevant  to  him  or  her  in  a  timely  fashion  without  the  need  for 
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submitting  the  same  query  over  and  over  again.  This  automatic  fl^ofrele^ 
informat  ion  helps  the  user  keep  pace  with  the  ever-increasing  rate  of  information 
generation.  From  the  system  point  of  view,  profiles  fulfill  a  role  simi  ar  o 
queries  in  database  or  information  retrieval  systems.  In  fact,  profiles  ar 
of  continuously  executing  query.  In  a  large  publish  subscribe  f^the 

and  access  of  user  profiles  can  be  be  resource-intensive.  Additionally,  given  ^ 
fact  that  user  interests  are  changing  over  time,  the  profiles  mu  P 

accordingly  to  reflect  up  to  date  information  needs. 

We  have  developed  an  algorithm  called  MM-Moiat  (MM),  for :  -Mr 
tallv' constructing  L  maintaining  use,  profiles  for  filtenng  teased 1  data 
items  [Ceti98l  MM  can  be  tuned  to  tradeoff  effectiveness  (i.e.,  accuracy  of  the 
filtered  data  items),  and  efficiency  of  profile  management  .  The  alSont^  re™ 
relevance  feedback  information  from  the  users  about  the  documents  tha  y 
have  seen  (i.e.,  a  binary  indication  of  whether  or  not  the  document  was  con¬ 
sidered  useful),  and  uses  this  information  to  improve  the  current 
important  aspect  of  MM  is  that  it  represents  a  user  profile  as  multiple ^key  word 
vectors  whose  size  and  elements  change  dynamically  based  on  user  fe^dback; 

In  fact,  it  is  this  mulU-modal  representation  of  profiles  which  allows  MM 
to  tradeoff  effectiveness  and  efficiency.  More  specifically,  the  algorithm  can  b 
tuned  using  a  threshold  parameter  to  produce  profiles  with  Afferent  sizes.  Let 
consider  the  two  boundary  values  of  this  threshold  parameter  to ^  illustrate  th 
tradeoff-  When  the  threshold  is  set  to  0,  a  user  profile  is  represented  by  a  single 
keyword  vector,  achieving  an  extremely  low  overhead  for  profile  managemen  , 
bu't  seriouslv  limiting  the  effectiveness  of  the  profile.  At  the  other  extreme  f 
the  threshold  is  set  to  1,  we  achieve  an  extremely  fine  granularity  user  model 
however  the  profile  size  equals  the  number  of  relevant  documents  observed  by 
the  user  making  it  impractical  to  store  and  maintain  profiles^  Therefore  i 
more  desirable  to  consider  intermediate  threshold  values  which  will  provide  an 
optimal  effectiveness/efficiency  tradeoff  for  a  given  application. 

We  evaluated  the  utility  of  MM  by  experimentally  investigating  its  ability  to 
categorize  pages  from  the  World  Wide  Web.  We  used  non-mterpolated  average 
precision  as  our  primary  effectiveness  metric  and  focused  on  the  profile  size 
for  quantifying  the  efficiency  of  our  approach.  We  demonstrated  that  we 
achieve  significantly  higher  precision  values  with  modest  increase i  in  profile  J  . 
Additionally,  we  were  able  to  achieve  precision  values  with  small  profiles  tha 
were  comparable  to,  or  in  some  cases  even  better  than  those  obtained 
maximum-sized  profiles.  The  details  of  the  algorithm,  experimental  setting,  an 
the  results  are  discussed  in  [Ceti98]. 


5  Summary 

The  increasing  ability  to  interconnect  computers  through  internetworking,  mo- 
Se  S  wireless  networks,  nnd  high-bandwidth  content  de  ivery  to  the  home, 
has  resulted  in  a  proliferation  of  dissemination-oriented  applications.  These  ap¬ 
plications  present  new  challenges  for  data  management  throughout  all  compo- 


nents  of  a  distributed  information  system.  We  have  proposed  the  notion  of  a 
Dissemination-Based  Information  System  (DBIS)  that  integrates  many  different 
data  delivery  mechanisms  and  described  some  of  the  unique  aspects  of  such  sys¬ 
tems.  We  described  our  initial  prototype  of  a  DBIS  Toolkit,  which  provides  a 
platform  for  experimenting  with  different  implementations  of  the  DBIS  Compo¬ 
nents.  Finally  we  described  our  work  on  two  of  the  many  research  issues  that 
arise  in  the  design  of  DBIS  architectures. 

Data  Dissemination  and  data  broadcasting  are  very  fertile  and  important 
areas  for  continued  research  and  development.  In  fact,  we  see  a  migration  of 
data  management  concerns  from  the  traditional  disk-oriented  architectures  of 
existing  database  systems,  to  the  more  general  notion  of  Network  Data  Manage¬ 
ment,  in  which  the  movement  of  data  throughout  a  complex  and  heterogeneous 
distributed  environment  is  of  paramount  concern.  Our  ongoing  research  efforts 
are  aimed  at  better  understanding  the  challenges  and  tradeoffs  that  arise  in  the 
development  of  such  systems. 
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DBIS-Toolkit:  Adaptive  Middleware  for  Large  Scale  Data  Delivery 
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Introduction 


The  proliferation  of  the  Internet  and  intranets,  advances  in  wireless  and  satellite  networks,  and  the 
availability  of  asymmetric,  high-bandwidth  links  to  the  home,  have  fueled  the  deve  opmem  of  a  wide 
ranee  of  new  ••dfsTeminat, on-based-  •  applications.  These  applications  involve  the  timely  distribution  of 
data  to  a  lame  set  of  consumers,  and  include  stock  and  sports  tickers,  traffic  information  systems, 
electronic  personalized  newspapers,  and  entertainment  delivery.  Dissemination-onented  aPPllcat‘ons 
have  special  characteristics  that  render  traditional  client-server  data  from ' 

These  include:  tremendous  scale,  significant  overlap  in  user  data  needs,  and  asymmetric  data  flo 

sources  to  consumers. 

The  mismatch  between  the  data  access  characteristics  of  these  applications  and the ,tec^±®' '  “ 

implement  them  on  the  WWW  results  in  scalability  problems  [Fran98]  .For '  ^ethod 

annhcations  employ  the  HTTP  protocol  which  uses  a  request-response  (or  client-server),  urn 
of  data  delivery.  Using  request-response,  each  user  sends  requests  for  data  to  the  server.  The  larg 
audience  for  a  popular  event  can  generate  huge  spikes  in  the  load  at  servers  resulting  in  ong  e  ay 
overloaded  servers.  Compounding  the  situation  is  that  users  must  continually  poll  the  server  to  ob 
^hemost  current  data,  result, ng  in  multiple  requests  for  the  same  data  items  from  each  user.  In  an 
application  such  as  m  election  result  seiver,  where  die  inteiests  of  a  large  part  of  the  population  are 
known  a  priori,  most  of  these  requests  are  unnecessary. 

In  order  to  address  the  needs  of  this  new  class  of  applications,  we  are  developing  a  Dissemination-Based 
T  r  c  t  (TypiTCh  toolkit  The  toolkit  serves  as  an  adaptable  middleware  layer  that 

™"ss£e^ 

in  a  Networked  environment.  The  toolkit  also  includes  facilities  for  performance  monitoring  ' which  can 
allow  a  system  developer  to  examine  the  impact  of  using  different  data  dehvery  mechanisms.  We 
implemented  an  initiaWersion  of  this  toolkit  and  have  used  it  to  develop  a  weather  map  dissemina 

application. 


DBIS-Toolkit  Overview 
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The  DBIS  Framework 


The  basic  concepts  of  the  DBIS  framework  were  presented  at  the  OOPSLA  97  conference  [Fran 97],  A 
more  recent  description  appears  in  [Akso98b].  The  two  major  features  of  the  framework  are.  First,  it 
incorporates  a  number  of  different  options  for  data  delivery,  including  traditional  request-response, 
publish/subscribe.  Broadcast  Disks  [Acha95,  Acha97]  and  on-demand  broadcast  [Akso98a],  Second,  it 
is  based  on  the  notion  of  network  transparency ,  which  allows  different  data  delivery  mechanisms  to  be 
mixed-and-matched  within  a  single  application.  Network  transparency  is  provided  through  the  use  of 
Information  Brokers ,  which  acquire  information  and  distribute  it  to  other  consumers.  Brokers  are 
middlemen;  a  broker  acts  as  a  client  to  some  number  of  data  sources,  collects  and  possibly  repackages 
the  data  it  obtains,  and  then  functions  as  a  data  source  to  other  nodes  of  the  system.  Along  the  way, 
brokers  may  add  value  to  the  information,  such  as  integrating  it  with  data  from  other  sources  or 
enhancing  its  organizational  structure.  By  creating  hierarchies  of  brokers,  information  delivery  can  be 
tailored  to  the  needs  of  many  different  users. 


Data  Source  < 


Information 

Broker 


Client 


Figure  1:  An  Instantiation  of  a  DBIS 


Toolkit  Description 

The  toolkit  provides  a  set  of  application  programming  interfaces  (APIs)  and  libraries  that  allow  a 
developer  to  construct  and  experiment  with  a  DBIS  application.  Figure  1  shows  an  example  instantiation 
of  a  DBIS  using  the  current  toolkit.  The  DBIS-Toolkit  consists  of  four  main  components  (shown  as 
lightly-shaded  items  in  the  figure): 

Data  Source  (DS)  Library  -  a  data  source  wrapper  that  encapsulates  network  communication  and 
provides  conversion  functions  for  data. 

Client  Library  -  a  client  program  wrapper  that  encapsulates  network  communication  and  provides 
conversion  functions  for  queries  and  user  profiles.  It  also  provides  monitoring  and  filtering  of  broadcast 
or  multicast  channels. 

Information  Broker  (IB)  -  the  main  component  of  the  DBIS-Toolkit.  The  IB  contains  communication, 
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buffering,  scheduling,  and  catalog  management  components  and  is  described  m  more  detail  below. 

Information  Broker  Master  -  The  IB  Master  is  responsible  for  managing  global  catalog  information 
about  data  and  the  topology  of  the  DBIS.  All  IBs  must  register  with  the  IB  Master  and  all  catalog 
updates  must  be  sent  to  the  IB  Master. 

In  addition  to  these  four  components,  the  toolkit  contains  a  flexible  performance  mom tonng  capability 
that  can  be  used  to  graphically  display  real-time  performance  metncs  such  as  bandwidth  and  CPU 
utilization,  response  times,  etc.  on  a  per-IB  basis. 

Data  Modeling 

As  the  focus  of  this  project  to  date  has  been  on  the  "plumbing"  required  to  integrate  multiple  forms  of 
data  delivery  at  the  application  level,  the  current  prototype  uses  a  very  simple  data  model  consisting  of 
categories  and  keywords  within  those  categories.  Categories  and  keywords  are  used  m  the  specification 
TalZes  and  profiles.  Queries  are  pull  requests  that  are  transmitted  from  a  client  to  a  data  source  (via 
one  or  more  IBs).  Queries  consist  of  a  category  and  optional  keywords.  Queries  are  ultimately  processed 
at  a  data  source  ~  all  data  items  that  match  the  category  and  at  least  one  keyword  (if  specified)  are  sent 
to  the  client  from  which  the  query  originated.  In  contrast,  profiles  are  used  to  support 
delivery  When  a  new  data  item  arrives  at  an  IB,  its  category  and  keywords  are  compared  with  the  user 
profile  ^registered  at  that  IB  and  the  item  is  sent  to  any  clients  whose  profile  indicates  an  interest  in  th 
kern.  Thus,  profiles  can  be  viewed  as  a  form  of  continually  executing  queries.  The  integration  of  more 
sophisticated  data  models  such  as  (XML-based)  semi  structured  models,  and  more  flexible  IR-style 
models  is  one  aspect  of  our  on-going  development  for  the  toolkit. 
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Information  Broker  Architecture 

As  stated  above,  the  IB  contains  much  of  the  functionality  of  the  DBIS-Toolkit.  The  IB  module  (shown 
in  Figure  2)  consists  of  the  following  components: 

Data  Source  Manager  (DSM)  -  This  component  obtains  (via  push  or  pull)  data  items  from  the  data 
sources  and  matches  them  with  client  pull  requests  or  profiles. 

Broker  Agent  (BA)  -  This  component  performs  similar  functions  as  the  DSM  but  for  sources  that  are 
actually  other  IBs  (rather  than  data  sources).  In  addition,  the  BA  handles  other  IB-to-IB  functions  such 
as  profile  and  request  forwarding. 

Catalog  Manager  -  This  component  manages  local  copies  of  catalog  information  for  use  by  the 
processes  running  at  the  broker.  All  catalog  changes  are  sent  to  the  IB  Master,  which  propagates  them  to 
the  catalog  managers  of  all  other  IBs. 

Broadcast  Manager  -  Once  data  have  been  filtered  through  the  DSM  or  BA,  they  are  passed  to  the 
Broadcast  Manager,  which  has  two  main  components.  The  Mapper  assigns  data  items  to  one  or  more 
physical  communication  channels.  The  Scheduler  makes  decisions  about  the  order  in  which  data  items 
should  he  placed  on  those  channels. 
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Network  Manager  -  This  is  the  lowest  level  of  the  communication  component  of  the  IB.  It  sends  data_ 
packets  to  the  network  according  to  the  information  provided  by  the  broadcast  manager. 

Client  Manager  -  This  module  handles  requests  that  arrive  from  the  IB’s  clients.  It  forwards  them  to  the 
proper  modules  within  the  IB  and  maintains  communication  sessions  with  the  clients. 

A  DBIS  Application 

An  initial  version  of  the  DBIS-Toolkit  has  been  built  using  Windows  NT  and  its  IP  Multicast  support. 
The  toolkit  has  been  used  to  create  a  weather  map  dissemination  application  (see  Figure  3).  In  this 
application  “map  servers”  send  out  updated  maps  of  different  types  (i.e.,  radar,  satellite  image,  etc.)  for 
different  regions  of  the  United  States.  Clients  can  subscribe  to  receive  updates  for  specific  types  of  maps 
for  specific  regions.  Users  can  also  pose  queries  to  obtain  the  most  recent  versions  of  specific  maps  or  to 
zoom  in  on  specific  regions  of  the  maps.  Maps  are  delivered  over  unicast  or  multicast  links.  The 
application  serves  as  a  demonstration  vehicle  emphasizing  the  following  unique  aspects  of  the 
DBIS-Toolkit: 

•  The  incorporation  of  multiple  delivery  mechanisms  and  the  ways  in  which  they  are  supported  by 
the  various  components  of  the  toolkit. 

•  The  ability  to  make  efficient  use  of  available  resources  by  choosing  appropriate  delivery 

mechanisms.  . 

•  The  exploitation  of  Network  Transparency  through  the  use  of  multiple  levels  of  Information 

^  rokers 

•  The  ability  to  monitor  the  system  dynamically  using  the  graphical  performance  monitor. 
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Figure  3:  Example  DBIS  Application 
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Abstract 

The  dramatic  improvements  in  global  interconnectivity  due 
to  intranets,  extranets,  and  the  Internet  has  led  to  an  explo¬ 
sion  in  the  number  and  variety  of  new  data-intensive  applica¬ 
tions.  Along  with  the  proliferation  of  these  new  applications 
have  come  increased  problems  of  scale.  This  is  demonstrated 
by  frequent  delays  and  service  disruptions  when  accessing 
networked  data  sources.  Recently,  push- based  techniques 
have  been  proposed  as  a  solution  to  scalability  problems  for 
distributed  applications.  This  paper  argues  that  push  in¬ 
deed  has  its  place,  but  that  it  is  just  one  aspect  of  a  much 
larger  design  space  for  distributed  information  systems.  We 
propose  the  notion  of  a  Dissemination-Based  Information 
System  (DBIS)  which  integrates  a  variety  of  data  delivery 
mechanisms  and  information  broker  hierarchies.  We  discuss 
the  properties  of  such  systems  and  provide  some  insight  into 
the  architectural  imperatives  that  will  influence  their  design. 
The  DBIS  framework  can  serve  as  the  basis  for  development 
of  a  toolkit  for  constructing  distributed  information  systems 
that  better  match  the  technology  they  employ  to  the  char¬ 
acteristics  of  the  applications  they  are  intended  to  support. 


1  Introduction 

1.1  The  World-Wide  Wait 

The  scenario  is  all  too  familiar  —  a  major  event,  such  as 
a  national  election,  is  underway  and  the  latest,  up-to-the 
minute  results  are  being  posted  on  the  Web.  You  want  to 
monitor  the  results  for  the  important  national  races  and 
for  the  races  in  your  state,  so  you  fire  up  your  trusty  web 

•This  work  has  been  partially  supported  by  the  NSF  under  grant 
IRI-9501353,  by  Rome  Labs  Agreement  Number  F3O602- 97-2- 0241 
under  ARPA  order  number  F078,  by  an  IBM  Cooperative  Gradu¬ 
ate  Fellowship,  and  by  research  funding  and  equipment  from  Intel 
Corporation. 


browser,  point  it  at  the  election  result  web  site  and  wait, 
and  wait,  and  wait....  What’s  the  problem?  It  could  be 
any  number  of  technical  glitches:  a  congested  network,  an 
overloaded  server,  or  even  a  crashed  server.  In  a  larger  sense, 
however,  the  problem  is  one  of  scalability;  the  system  cannot 
keep  up  with  the  heavy  load  caused  by  the  (transient)  surge 
in  activity  that  occurs  in  such  situations. 

We  argue  that  such  scalability  problems  are  the  result 
of  a  mismatch  between  the  data  access  characteristics  of  the 
application  and  the  technology  (in  this  case,  HTTP)  used  to 
implement  the  application.  An  election  result  server,  such 
as  that  of  the  preceding  scenario,  is  an  example  of  a  data 
dissemination-oriented  application.  Data  dissemination  in¬ 
volves  the  delivery  of  data  from  one  or  more  sources  to  a 
large  set  of  consumers.  Many  dissemination-oriented  ap¬ 
plications  have  data  access  characteristics  that  differ  sig¬ 
nificantly  from  the  traditional  notion  of  client-server  appli¬ 
cations  as  embodied  in  navigational  web  browsing  technol¬ 
ogy.  For  example,  the  election  result  server  has  the  follow¬ 
ing  characteristics:  1)  There  is  a  huge  population  of  users 
(potentially  many  millions)  who  want  to  access  the  data;  2) 
There  is  a  tremendous  degree  of  overlap  among  the  interests 
of  the  user  population;  3)  Users  who  are  following  the  event 
closely  are  interested  only  in  new  data  and  changes  to  the 
existing  data;  and,  4)  The  amount  of  data  that  must  be  sent 
to  most  users  is  fairly  small.  When  looking  at  these  char¬ 
acteristics,  it  becomes  clear  that  the  request-response  (i.e., 
RPC),  unicast  (i.e.,  point-to-point)  method  of  data  delivery 
used  by  HTTP  is  the  wrong  approach  for  this  application. 

Using  request-response,  each  user  sends  requests  for  data 
to  the  server.  The  large  audience  for  a  popular  event  can 
generate  huge  spikes  in  the  load  at  servers,  resulting  in  long 
delays  and  server  crashes.  Compounding  the  situation  is 
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that  users  must  continually  poll  the  server  to  obtain  the  most 
current  data,  resulting  in  multiple  requests  for  the  same  data 
items  from  each  user.  In  this  example  application,  where  the 
desires  of  a  large  part  of  the  population  are  known  a  priori, 
most  of  these  requests  are  unnecessary. 

The  use  of  unicast  data  delivery  likewise  causes  problems 
in  the  opposite  direction  (from  servers  to  clients).  With  uni¬ 
cast  the  server  is  required  to  respond  individually  to  each 
request,  often  transmitting  identical  data.  For  an  applica¬ 
tion  with  many  users,  the  costs  of  this  repetition  in  terms 
of  network  bandwidth  and  server  cycles  can  be  devastating. 

1.2  Is  “Push”  the  Answer? 

The  above  scenario  is  well-known  to  web  users  and,  not  sur¬ 
prisingly.  an  increasing  number  of  products  are  being  intro¬ 
duced  to  address  it.  A  number  of  these  products  have  re¬ 
ceived  tremendous  media  attention  lately  because  they  are 
based  on  a  technology  called  data  Push.  Using  data  push, 
the  transmission  of  data  to  users  is  initiated  without  requir¬ 
ing  the  users  to  explicitly  request  it.  Examples  of  systems 
that  employ  some  form  of  push  technology  include  Point¬ 
Cast.  Marimba.  BackWeb,  and  AirMedia.  Push  has  also 
been  added  to  recent  versions  of  the  major  Web  browsers, 
and  the  battle  for  data  push  standards  is  well  underway. 

Systems  that  are  truly  implemented  with  data  push  can 
indeed  solve  some  of  the  scalability  problems  attributed  above 
to  request-response.  Since  users  do  not  have  to  poll  servers 
for  new  and  updated  data,  the  number  of  client  requests  that 
must  be  handled  by  a  server  can  be  reduced  dramatically. 
Simply  changing  from  a  client  “Pull”  model  to  a  push  model, 
however,  does  not  solve  all  the  problems  for  an  application 
such  as  the  election  result  server.  In  particular,  performing 
push  to  millions  of  clients  using  a  unicast  communication 
protocol  does  little  to  address  network  bandwidth  problems 
and  still  requires  the  server  to  perform  substantial  work  for 
each  client  it  is  serving.  Compounding  the  confusion  is  the 
fact  that  many  systems  that  provide  a  “push"  interface  to 
users  are  actually  implemented  using  a  programmed  polling 
mechanism.  These  systems  simply  save  the  user  from  hav¬ 
ing  to  click,  but  do  nothing  to  solve  the  scalability  problems 
caused  by  the  request-response  approach. 

The  election  result  server  is  an  example  of  just  one  type 
of  dissemination-oriented  application.  Other  examples  in¬ 
clude  news  and  entertainment  delivery,  software  distribu¬ 


tion.  traffic  information  systems,  and  navigational  web  brows¬ 
ing.  These  applications  differ  widely  in  the  characteris¬ 
tics  of  the  data  involved  (e.g..  size,  consistency  constraints, 
etc.),  access  patterns,  and  communication  channel  proper¬ 
ties  (e.g.,  symmetric  vs.  asymmetric,  continuously  or  inter¬ 
mittently  connected,  etc.).  No  one  data  delivery  mechanism 
can  provide  adequate  support  for  the  wide  variety  of  such 
applications. 

To  address  this  need,  we  sire  developing  a  general  frame¬ 
work  for  describing  and  ultimately  constructing  Dissemination- 
Based  Information  Systems  (DB1S).  In  this  framework,  push 
vs.  pull  is  a  choice  along  just  one  of  several  dimensions  of 
the  design  space  for  data  delivery  mechanisms.  In  this  pa¬ 
per,  we  outline  a  number  of  data  delivery  mechanisms  and 
investigate  the  tradeoffs  among  them.  The  goal  is  to  de¬ 
velop  a  flexible  architecture  that  is  capable  of  supporting 
a  wide  range  of  applications  across  many  varied  environ¬ 
ments,  such  as  mobile  networks,  satellite-based  systems,  and 
wide-area  networks.  By  combining  the  various  data  deliv¬ 
ery  techniques  in  a  way  that  matches  the  characteristics  of 
the  application  and  achieves  the  most  efficient  use  of  the 
available  server  and  communication  resources,  the  scalabil¬ 
ity  and  performance  of  dissemination-oriented  applications 
can  be  greatly  enhanced. 

1.3  Overview  of  the  Approach 

We  view  an  integrated  DB1S  as  a  distributed  system  in 
which  the  links  between  the  computing  elements  vary  in 
character:  from  standard  pull-based  unicast  connections  to 
periodic  push  over  a  broadcast  channel.  A  key  point  is  that 
the  character  of  a  link  should  be  of  concern  only  to  the  nodes 
on  either  end.  For  example,  the  fact  that  an  information 
provider  receives  its  data  from  a  broadcast  link  as  opposed 
to  a  request-response  protocol  should  make  no  difference  to 
clients  of  that  provider. 

In  our  approach,  we  distinguish  between  three  types  of 
nodes:  (1)  data  sources  provide  the  base  data  for  the  ap¬ 
plication;  (2)  clients  consume  this  information;  and  (3)  in- 
jormation  brokers  add  value  to  information  and  redistribute 
it.  By  creating  hierarchies  of  these  nodes  connected  by  var¬ 
ious  data  delivery  mechanisms,  the  information  flow  can  be 
tailored  to  the  needs  of  many  different  applications. 

We  aim  to  provide  a  toolkit  of  architectural  components 
that  can  be  used  to  construct  a  DBIS.  A  builder  of  an  in- 
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formation  resource  would  make  use  of  these  components  to 
construct  the  interfaces  to  their  service.  Example  compo¬ 
nents  include  a  broadcast  generator,  a  set  of  dissemina¬ 
tion  services,  a  client  cache  manager,  a  client  prefetcher. 
a  backcharmel  monitor,  etc. 

In  the  remainder  of  the  paper  we  outline  our  current 
ideas  on  the  development  of  such  a  toolkit.  Section  2  de¬ 
scribes  several  options  for  data  delivery  mechanisms  (i.e., 
the  “links')  and  discussesthe  tradeoffs  among  them.  Sec¬ 
tion  3  addresses  the  various  types  of  nodes  in  a  DBIS.  Sec¬ 
tion  4  uses  the  DBIS  model  to  describe  several  existing 
dissemination-oriented  systems.  Section  5  outlines  issues  in 
the  development  of  a  DBIS  toolkit.  Section  6  lists  related 
work.  Finally.  Section  7  presents  our  conclusions. 

2  Options  for  Data  Delivery 

As  stated  in  the  Introduction,  a  key  aspect  of  the  DBIS 
framework  is  that  it  supports  a  wide  variety  of  links  for  data 
delivery  between  sources  and  clients.  Support  for  different 
styles  of  data  delivery  allows  a  DBIS  to  be  optimized  for 
various  server,  client,  network,  data,  and  application  prop¬ 
erties. 

2.1  Three  Characteristics 

We  identify  three  main  characteristics  that  can  be  used  to 
compare  data  delivery  mechanisms:  (1)  push  vs.  pull;  (2) 
periodic  vs.  aperiodic;  and  (3)  unicast  vs.  1-to-N.  Figure  1 
shows  these  characteristics  and  how  several  common  mech¬ 
anisms  relate  to  them. 

2.1.1  Client  Pull  vs.  Server  Push 

The  first  distinction  we  make  among  data  delivery  styles  is 
that  of  “push  vs.  pull” .  Current  database  servers  and  object 
repositories  manage  data  for  clients  that  explicitly  request 
data  when  they  require  it.  When  a  request  is  received  at 
a  server,  the  server  locates  the  information  of  interest  and 
returns  it  to  the  client.  This  request-response  style  of  opera¬ 
tion  is  pull-based—  the  transfer  of  information  from  servers 
to  clients  is  initiated  by  a  client  pull.  In  contrast,  push-based 
data  delivery  involves  sending  information  to  a  client  popu¬ 
lation  in  advance  of  any  specific  request.  With  push-based 
delivery,  the  server  initiates  the  transfer. 


2.1.2  Aperiodic  vs.  Periodic 

Both  push  and  pull  can  be  performed  in  either  an  aperi¬ 
odic  or  periodic  fashion.  Aperiodic  delivery  is  event-driven 
—  a  data  request  (for  pull)  or  transmission  (for  push)  is 
triggered  by  an  event  such  as  a  user  action  (for  pull)  or 
data  update  (for  push).  In  contrast,  periodic  delivery  is 
performed  according  to  some  pre-arranged  schedule.  This 
schedule  may  be  fixed,  or  may  be  generated  with  some  de¬ 
gree  of  randomness.1  An  application  that  sends  out  stock 
prices  on  a  regular  basis  is  an  example  of  periodic  push, 
whereas  one  that  sends  out  stock  prices  only  when  they 
change  is  an  example  of  aperiodic  push. 

2.1.3  Unicast  vs.  1-to-N 

The  third  characteristic  of  data  delivery  mechanisms  we 
identify  is  whether  they  are  based  on  unicast  or  1-to-N  com¬ 
munication.  With  unicast  communication,  data  items  are 
sent  from  a  data  source  (e.g.,  a  single  server)  to  one  other 
machine,  while  1-to-N  communication  allows  multiple  ma¬ 
chines  to  receive  the  data  sent  by  a  data  source.  Two  types 
of  1-to-N  data  delivery  can  be  distinguished:  multicast  and 
broadcast.  With  multicast,  data  is  sent  to  a  specific  sub¬ 
set  of  clients.  In  some  systems  multicast  is  implemented  by- 
sending  a  message  to  a  router  that  maintains  the  list  of  re¬ 
cipients.  The  router  reroutes  the  message  to  each  member 
of  the  list.  Since  the  list  of  recipients  is  known,  it  is  pos¬ 
sible  to  make  multicast  reliable;  that  is,  network  protocols 
can  be  developed  that  guarantee  the  eventual  delivery  of 
the  message  to  all  clients  that  should  receive  it.  In  contrast, 
broadcasting  sends  information  over  a  medium  on  which  an 
unidentified  and  unbounded  set  of  clients  can  listen.  This 
differs  from  multicast  in  that  the  clients  who  may  receive 
the  data  are  not  known  a  priori. 

2.2  Classification  of  Delivery  Mechanisms 

It  is  possible  to  classify  some  existing  data  delivery  mech¬ 
anisms  using  the  characteristics  described  above.  Such  a 
classification  is  shown  in  Figure  1.  We  discuss  several  of  the 
leaves  in  this  diagram  below. 

1  For  the  purposes  of  this  discussion.  we  do  not  distinguish  between 
fixed  and  randomized  schedules.  Such  a  distinction  is  important  in 
certain  applications.  For  example,  algorithms  for  conserving  energy 
in  mobile  environments  proposed  by  Imielinskiet  al.  [Imie94b]  depend 
on  a  strict  schedule  to  allow  mobile  clients  to  “doze”  during  periods 
when  no  data  of  interest  to  them  will  be  broadcast. 
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Figure  1:  Data  Delivery  Options 


Request/Response  -  Traditional  request /response  mech¬ 
anisms  use  aperiodic  pull  over  a  unicast  connection.  If  in¬ 
stead.  a  1-to-N  connection  is  used,  then  clients  can  “snoop” 
on  the  requests  made  by  other  clients,  and  obtain  data  that 
they  haven't  explicitly  asked  for. 

Polling  -  In  some  applications,  such  as  remote  sensing, 
a  system  may  periodically  send  requests  to  other  sites  to  ob¬ 
tain  status  information  or  to  detect  changed  values.  If  the 
information  is  returned  over  a  1-to-N  link,  then  as  with  re¬ 
quest/response.  other  clients  can  snoop  to  obtain  data  items 
as  they  go  by. 

Publish/Subscribe  -  Publish/subscribe  protocols  are 
becoming  a  popular  way  to  disseminate  information  in  a 
network  [Oki93.  Yan95.  Glan96].  Publish/subscribe  is  push- 
based:  data  flow  is  initiated  by  the  data  sources,  and  is  ape¬ 
riodic.  as  there  is  no  predefined  schedule  for  sending  data. 
Such  protocols  are  typically  performed  in  a  1-to-N  fashion, 
but  a  similar  protocol  can  be  used  over  a  unicast  channel, 
as  is  done  for  triggers  in  active  database  systems. 

Broadcast  Disks  -  Periodic  push  has  been  used  for  data 
dissemination  in  many  systems  such  as  TeleText  [Amma85, 
WongSSj.  DataC'ycle  [Herm87,  Bowe92],  Broadcast  Disks 
[Acha95a,  Acha95b]  and  mobile  databases  [Imie94a].  Clients 
needing  access  to  a  data  item  that  is  pushed  periodically  can 
wait  until  the  item  appears.  As  with  aperiodic  push,  peri¬ 
odic  push  can  also  be  used  with  both  unicast  and  1-to-N 
channels,  but  we  believe  that  1-to-N  is  likely  to  be  much 
more  prevalent. 

2.3  Some  Example  Tradeoffs 

As  can  be  seen  from  the  preceding  discussion,  the  design 
space  for  data  delivery  mechanisms  is  quite  large.  Choos¬ 
ing  the  proper  mechanism  (or  combination  of  them)  to  use 
for  a  eiven  link  requires  an  understanding  of  the  tradeoffs 


among  them.  In  a  recent  paper,  we  studied  one  such  set 
of  tradeoffs;  namely,  those  between  broadcasting  data  using 
periodic  push  (Broadcast  Disks)  and  aperiodic  pull  (request- 
response  with  snooping)  [Acha97].  Here,  we  briefly  discuss 
some  observations  from  that  study. 

The  tradeoffs  between  push  and  pull  in  general  revolve 
around  the  costs  of  initiating  the  transfer  of  data.  A  pull- 
based  approach  requires  the  use  of  a  backchannel  for  each 
request.  Furthermore,  as  described  in  the  Introduction,  the 
server  must  be  interrupted  continuously  to  deal  with  such 
requests  and  has  limited  flexibility  in  scheduling  the  order  of 
data  delivery.  Also,  the  information  that  clients  can  obtain 
from  a  server  is  limited  to  that  which  the  clients  know  to 
ask  for.  Thus,  new  data  items  or  updates  to  existing  data 
items  may  go  unnoticed  at  clients  unless  they  periodically 
poll  the  server. 

Push-based  approaches,  in  contrast,  avoid  the  issues  iden¬ 
tified  for  client-pull,  but  have  the  problem  of  deciding  which 
data  to  send  to  clients  in  the  absence  of  specific  requests. 
Clearly,  sending  irrelevant  data  to  clients  is  a  waste  of  re¬ 
sources.  A  more  serious  problem,  however,  is  that  in  the 
absence  of  requests  it  is  possible  that  the  servers  will  not 
deliver  the  specific  data  needed  by  clients  in  a  timely  fashion 
(if  ever).  Thus,  the  usefulness  of  server  push  is  dependent 
on  the  ability  of  a  server  to  accurately  predict  the  needs  of 
clients.  One  solution  to  this  problem  is  to  allow  the  clients 
to  provide  a  profile  of  their  interests  to  the  servers.  As  men¬ 
tioned  above,  Publish/subscribe  protocols  are  one  popular 
mechanism  for  providing  such  profiles. 

In  [Acha97]  we  studied  a  hybrid  push/pull  broadcast  sys¬ 
tem.  In  this  system,  a  broadcast  server  is  responsible  for 
allocating  a  fixed  broadcast  bandwidth  between  data  items 
(pages)  that  are  broadcast  according  to  a  fixed  schedule  (i.e., 
periodic  push)  and  pages  that  are  broadcast  in  response  to 


63 


client  requests  sent  over  a  backchanne)  (i.e.,  aperiodic  pull). 
The  fundamental  performance  tradeoff  between  these  two 
approaches  can  be  seen  in  in  Figure  2.  which  shows  results 
from  [Acha97]2.  The  x-axis  in  the  figure  models  the  number 
of  clients  (all  having  identical  access  rates  and  distributions) 
that  are  accessing  data  from  the  broadcast.  Thus,  at  a  value 
of  250.  the  broadcast  is  serving  25  times  as  many  clients  than 
at  a  value  of  10.  The  v-axis  indicates  the  average  number 
of  items  that  a  client  must  watch  go  by  on  the  broadcast 
before  the  item  it  wants  appears. 


Figure  2:  Push  vs.  Pull  for  Broadcast 

The  flat  line  in  the  figure  (marked  by  diamonds)  indi¬ 
cates  the  performance  of  a  pure  push  approach,  in  which  all 
data  is  broadcast  repeatedly  with  no  requests  sent  by  the 
clients.  This  figure  was  generated  using  a  skewed  (Zipfian) 
access  pattern  over  1000  items.  The  broadcast  schedule  used 
by  the  push  approach  was  tailored  to  support  a  skewed  ac¬ 
cess  pattern  through  the  use  of  Broadcast  Disks  which  allow 
the  frequencv  of  broadcast  for  an  item  to  be  based  on  that 
item's  popularity  [Acha95a,  Acha95b].  As  can  be  seen  in 
the  figure,  the  performance  of  pure  push  is  independent  of 
the  number  of  clients  listening  to  the  broadcast  here.  This 
is  a  fundamental  property  of  data  broadcast  using  periodic 
push  —  if  there  is  a  large  overlap  in  the  interests  of  clients, 
it  provides  tremendous  scalability  in  terms  of  client  popula¬ 
tion. 

The  other  curve  in  the  figure  (marked  by  boxes)  shows 
the  performance  of  a  null-based  approach,  in  which  clients 
submit  requests  to  the  server  via  the  backchannel,  and  the 

2  We  briefly  ummurije  these  results  here,  interested  readers  are 
referred  to  [Acha97]for  more  details 


server  broadcasts  the  requested  pages  in  FIFO  order.3  As 
can  be  seen  in  the  figure,  the  pull-based  approach  exhibits 
an  S-shaped  behavior  —  it  provides  extremely  fast  response 
time  for  a  lightly  loaded  server,  but  as  the  server  becomes 
loaded,  its  performance  degrades,  until  it  ultimately  stabi¬ 
lizes  (in  this  case,  at  a  value  of  500  items,  or  half  the  size  of 
the  database  being  broadcast  here). 

The  behavior  of  aperiodic  pull  in  this  case  can  be  ex¬ 
plained  as  follows.  With  a  lightly  loaded  system,  the  server 
is  typically  idle  so  it  cam  respond  immediately  when  a  re¬ 
quest  is  received.  As  the  load  increases,  however,  the  server 
saturates  and  becomes  less  responsive.  Compared  to  peri¬ 
odic  push,  it  is  clear  that  aperiodic  pull  demonstrates  less 
scalability  in  this  case.  It  is,  however,  important  to  note 
that  aperiodic  pull  over  a  unicast  channel  would  be  far  less 
scalable  —  wait  time  would  increase  in  an  unbounded  fash¬ 
ion  as  the  server  approached  saturation.  In  contrast  us¬ 
ing  broadcast,  the  performance  of  aperiodic  pull  eventually 
flattens  out  in  this  case,  because  of  the  overlap  in  the  in¬ 
terests  of  the  client  population.  Once  the  server  reaches  the 
state  where  all  data  items  are  in  the  FIFO  queue,  additional 
clients  receive  all  of  their  data  by  simply  ‘‘snooping  on  the 
broadcast.  In  this  case  the  performance  of  aperiodic  pull 
at  saturation  is  worse  than  that  of  periodic  push,  because 
the  broadcast  schedule  generated  by  the  FIFO  discipline  is 
less  well  suited  to  the  access  pattern  than  the  pre-computed 
schedule  used  by  periodic  push.  As  discussed  in  [Acha9/], 
the  problems  of  pull  can  be  exacerbated  if  the  server  drops 
client  requests  when  it  becomes  overloaded. 

The  tradeoffs  described  above  give  an  indication  of  the 
kinds  of  concerns  that  must  be  balanced  when  choosing  the 
proper  data  delivery  mechanism  for  a  given  situation.  An¬ 
other  set  of  options  arises  in  the  organization  of  the  nodes 
for  a  DBIS,  as  described  in  the  following  section. 

3  Design  Options  for  Nodes 

While  the  discussion  so  far  has  focused  on  the  ways  in  which 
data  is  communicated  between  computing  devices,  the  nodes 
in  a  Dissemination-Based  Information  System  play  a  crucial 
role  as  well:  the  nodes  provide  the  glue  that  pastes  var¬ 
ious  data  distribution  schemes  together.  A  DBIS  toolkit 
should  contain  classes  that  model  some  of  the  basic  features 

*  Became  a  single  broadcast  of  an  item  satisfies  all  clients  waiting 
for  that  item,  we  do  not  enqueue  a  request  for  an  item  that  is  already 
in  the  FIFO  queue. 
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of  nodes.  This  section  outlines  some  of  those  features. 

3.1  Classification 

In  an  integrated  DBIS.  there  will  be  three  types  of  nodes: 
(1)  data  sources,  which  provide  the  base  data  that  is  to  be 
disseminated:  (2)  clients,  which  are  net  consumers  of  infor¬ 
mation:  and  (3)  information  brokers ,  that  acquire  informa¬ 
tion  from  other  sources,  add  value  to  that  information  (e.g., 
some  additional  computation  or  organizational  structure) 
and  then  distribute  this  information  to  other  consumers.  By 
creating  hierarchies  of  brokers,  information  delivery  can  be 
tailored  to  the  needs  of  many  different  users. 

Information  brokers  perform  many  important  functions 
in  our  architecture.  While  the  previous  discussion  focused 
primarily  on  different  modes  of  data  delivery,  the  brokers 
provide  the  glue  that  binds  these  modes  together.  It  is  typ¬ 
ically  the  expected  usage  patterns  of  the  brokers  that  will 
drive  the  selection  of  which  mode  of  delivery  to  use.  For 
example,  a  broker  that  typically  is  very  heavily  loaded  with 
requests  could  be  an  excellent  candidate  for  a  push-based 
delivery  mechanism  to  its  clients. 

As  we  move  upstream  in  the  data  delivery  chain,  brokers 
look  like  data  sources  to  their  clients.  Receivers  of  informa¬ 
tion  cannot  detect  the  details  of  interconnections  any  further 
upstream  than  their  immediate  predecessor.  This  principle 
of  network  transparency  allows  data  delivery  mechanisms  to 
change  without  having  global  impact.  Suppose  that  node  B 
is  pulling  data  values  from  node  A  on  demand.  Further,  sup¬ 
pose  that  node  C  is  listening  to  a  cyclic  broadcast  from  node 
B  which  includes  values  that  B  has  pulled  from  A.  Node 
C  will  not  have  to  change  its  data  gathering  strategy  if  A 
begins  to  push  values  to  B\  changes  in  links  are  negotiated 
purely  between  the  two  nodes  involved. 

Of  course,  nothing  is  ever  simple.  In  some  cases,  brokers 
can  also  be  sources  by  maintaining  their  own  databases.  In 
this  case,  the  hybrid  broker  can  add  data  of  its  own  to  what 
it  receives  from  its  upstream  counterparts.  The  principle 
of  network  transparency  also  protects  clients  from  having 
to  depend  on  this  situation.  A  data  source,  be  it  a  pure 
source,  a  broker,  or  a  hybrid  source,  only  guarantees  that  it 
can  provide  specific  data  —  independently  of  where  it  comes 
from. 


3.2  Caching 

While  nodes  can  perform  many  functions,  the  most  ubiqui¬ 
tous  data  management  facility  is  caching.  Unlike  caching  in 
client-server  systems,  the  path  from  data  sources  to  a  client 
can  be  of  length  greater  than  two.  Thus,  items  might  be 
cached  at  any  of  many  points  along  the  data  path  in  the 
network.  Thus,  caching  in  this  context  resembles  the  kind 
of  proxy  caching  that  one  might  find  in  a  wide-area  network 
(e.g.,  the  Internet). 

While  the  problems  here  are  very  similar  to  those  of 
any  proxy  caching  scheme,  the  broad  view  of  data  move¬ 
ment  available  in  a  DBIS  makes  the  potential  solutions  much 
richer.  For  example,  if  there  are  copies  of  a  particular  data 
item  in  multiple  caches,  there  will  always  be  an  issue  of 
how  those  copies  are  refreshed  when  the  primary  copy  is 
updated.  One  solution  is  to  send  invalidations  to  each  client 
cache  manager.  An  invalidation  message  results  in  the  purge 
of  the  item  from  the  cache.  Alternatively,  the  new  value 
could  be  propagated  to  the  client  cache  managers.  For  typ¬ 
ical  client/server  systems,  invalidation  is  usually  preferable. 
However,  in  our  broadcast  disk  studies  [Acha96b]  we  showed 
that  for  periodic  broadcast,  performance  can  often  be  im¬ 
proved  using  propagation. 

The  decision  about  how  current  to  keep  the  cached  copies 
is  the  same  as  in  other  caching  mechanisms.  Once  that  has 
been  decided,  the  means  by  which  it  is  achieved  can  vary. 
In  a  DBIS,  we  could  propagate  (i.e.,  push)  the  changes  to 
the  clients  or  wait  for  the  client  to  request  the  item  again 
(i.e.,  pull).  In  the  latter  case,  if  a  cache  manager  cares  about 
keeping  items  very  current,  it  will  have  to  poll  the  state  of 
the  object  often.  It  is  interesting  to  note  that  if  the  data 
delivery  mechanism  in  a  DBIS  changes,  the  means  by  which 
updates  are  propagated  (or  not)  may  also  need  to  change. 

Deciding  which  object  to  evict  from  the  cache  when  a 
new  candidate  arrives  is  another  issue  that  must  be  ad¬ 
dressed  by  any  cache  manager.  Many  systems  use  some 
form  of  LRU  for  this  purpose.  We  have  shown  in  previous 
work  [Acha95a]  that  for  some  styles  of  data  delivery  (e.g., 
broadcast  disks),  LRU  is  not  the  most  effective  choice.  For 
cyclic  data  delivery,  in  which  different  items  can  have  dif¬ 
ferent  arrival  frequencies,  a  cost-based  caching  scheme  per¬ 
forms  significantly  better. 

In  a  DBIS,  the  modes  of  data  delivery  might  change. 
In  such  an  environment,  the  caching  policy  could  change 
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to  match  the  prevailing  conditions.  We  will  need  heuristics 
for  deciding  the  appropriate  caching  policies  for  a  particular 
configuration  of  distributed  components.  As  an  example,  if 
node  B  initially  pulls  data  from  node  .4.  B  might  reasonably 
use  LRU  as  its  caching  policy.  When  A  creates  a  broadcast 
disk  which  is  read  by  B,  B  might  then  change  its  caching 
policy  to  a  cost  based  scheme  similar  to  the  one  that  we 
propose  in  [Acha95a]. 

3.3  Value-Added  Nodes 

Some  nodes  may  also  add  value  to  data  as  it  passes  through, 
by  performing  specific  computations  on  that  data.  The  com¬ 
putations  can  be  simple  or  complex,  or  they  can  act  on  single 
values  or  sets  of  values.  Other  nodes  may  simply  pass  values 
on  to  other  nodes. 

As  an  example,  suppose  node  A  pushes  stock  prices  for 
Fortune  500  companies  that  are  picked  up  by  node  B.  Node 
B  keeps  a  database  of  previous  st  ock  prices  and  when  a  new 
price  for  the  day  is  picked  up  from  node  A.  it  calculates  the 
difference  between  the  most  current  price  and  yesterday’s 
close,  and  pushes  this  value  out  to  yet  another  community. 
Node  B  is  a  push-based,  value-added  server.  Of  course,  it 
need  not  be  based  on  push.  Other  clients  could  pull  stock 
deviations  from  B  as  well. 

Another  kind  of  value-added  service  that  a  node  can  per¬ 
form  is  merging  o{  values  from  multiple  sources.  Merging  can 
occur  in  several  ways.  The  first  involves  multiple  sources 
that  maintain  similar  information.  The  merge  node  can 
make  the  most  reliable  or  most  current  version  of  a  value 
available.  Alternatively,  multiple  sources  may  maintain  a 
set  of  values  which  the  merge  node  combines  to  a  single 
value.  An  example  of  this  might  involve  nodes  that  maintain 
demographic  information  for  towns  including  their  current 
population.  Another  node  may  read  these  values  and  con¬ 
solidate  them  into  a  single  population  figure  for  the  state. 

Nodes  can  also  perform  the  service  of  filtering.  A  filtering 
node  will  receive  a  large  volume  of  data  from  another  node, 
only  some  fraction  of  which  it  makes  available  to  its  clients. 
For  example,  a  node  could  receive  all  stock  prices  from  the 
NYSE  and  provide  information  about  only  the  Fortune  500 
stocks  to  its  clients. 


3.4  Recoverable  Nodes 

Often  it  will  be  useful  to  make  guarantees  about  the  reliabil¬ 
ity  of  some  node.  Thus,  nodes  that  implement  some  degree 
of  recoverability  will  be  a  useful  component  in  a  DB1S.  Con¬ 
sider  a  node  that  must  guarantee  the  delivery  of  the  latest 
version  of  IBM's  stock  price.  Such  a  node  must  not  lose  its 
information  in  the  event  of  a  failure.  That  is,  if  the  informa¬ 
tion  was  received,  then  the  node  must  be  able  to  guarantee 
that  it  will  eventually  be  made  available  to  its  clients. 

Of  course,  having  recoverable  brokers  is  not  enough  on 
its  own  to  guarantee  that  nodes  will  not  miss  disseminated 
information  while  they  are  down.  In  order  to  address  this 
issue,  a  scheme  like  reliable  multicasting  would  have  to  be 
used.  Reliable  multicasting  will  eventually  deliver  all  mes¬ 
sages,  but  it  cannot  make  real-time  guarantees  about  when 
an  object  will  arrive. 

3.5  The  Burden  of  Push 

As  mentioned  in  Section  2.3,  any  node  that  provides  a  push 
service  must  do  so  on  the  basis  of  some  knowledge  of  the 
access  patterns  of  its  client  base.  If  the  node  pushes  data 
that  few  clients  care  about,  then  bandwidth  is  wasted.  The 
trick  is  to  broadcast  items  that  are  of  interest  to  a  large 
segment  of  the  user  community.  This,  of  course,  is  only 
possible  if  there  is  high  commonality  of  interest  for  at  least 
some  data  items. 

In  order  to  optimize  its  push  schedule,  the  server  must 
rely  on  profiles  of  user  needs.  Profiles  could  be  learned  by 
servers  if  clients  provide  feedback  about  the  effectiveness  of 
the  push  schedule.  Alternatively,  a  client  could  communi¬ 
cate  a  profile  to  the  server  at  appropriate  times,  such  as 
when  it  begins  to  listen  to  the  push,  at  regularly  scheduled 
intervals,  or  whenever  the  client  notices  that  the  current 
schedule  deviates  significantly  from  what  it  would  like  to 
see. 

What  would  such  a  profile  look  like?  A  profile  is  very 
much  like  a  continuously  executing  query  [Terr92].  In  other 
words,  it  is  a  predicate  that  indicates  the  items  that  the 
client  would  like  to  see.  It  is  continuously  executing  because 
the  server  will  push  items  as  long  as  there  are  currently  valid 

profiles  that  match  the  items. 

Profiles  can  be  interpreted  to  mean  that  whenever  a  new 
item  is  added  to  the  database  that  matches  a  profile,  the 
owner  of  that  profile  will  receive  the  new  data.  On  the 
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olher  hand,  the  profile  could  be  treated  more  as  a  hint  to 
the  server  indicating  interest  with  no  requirement  on  the 
server's  part  to  send  matching  items.  In  this  case,  the  server 
may  choose  to  conserve  bandwidth  and  not  send  a  matching 
item  in  order  to  best  serve  the  client  community  as  a  whole. 

4  Systems  Viewed  as  DBIS 

In  this  section,  we  describe  some  existing  systems  using  the 
concepts  of  our  DBIS  framework. 

4.1  PointCast 

PointCast  is  a  dissemination  service  that  has  attracted  a 
large  population  of  users.  It  obtains  profiles  from  users 
that  describe  their  interests,  and  then  uses  these  profiles 
to  assemble  and  update  customized  ‘’newspapers'  from  a 
database  of  current  stories. 

The  PointCast  system  has  been  touted  as  one  of  the  first 
push-based  systems.  This  is  not  exactly  true.  Other  sys¬ 
tems  such  as  Teletex  [AmmaS-5].  BCS  at  MIT  [GifT90],  and 
Datacycle  [HermST]  used  push  long  before  PointCast.  How¬ 
ever,  PointCast  was  one  of  the  first  push-based  systems  to 
achieve  wide-spread  use.  It  is  instructive,  therefore,  to  see 
exactly  how  push  is  used  in  PointCast  1.0  4. 

From  the  point  of  view  of  a  DBIS,  the  use  of  push  within 
PointCast  is  extremely  limited.  In  fact,  in  terms  of  the  net¬ 
work  architecture,  push  is  non-existent;  that  is,  the  flow 
of  requests  and  responses  within  the  global  architecture  is 
pull-based.  The  PointCast  client  on  a  user’s  workstation 
generates  requests  for  news  stories  that  match  the  user’s 
profile.  For  example,  if  the  user  indicates  an  interest  in  the 
computer  industry,  the  PointCast  client  polls  the  PointCast 
server  for  news  stories  with  the  keyword  “computer  indus¬ 
try'  whenever  the  PointCast  screen  saver  is  enabled.  All  of 
these  requests  can  generate  lots  of  network  traffic. 

So,  where’s  the  push?  If  we  look  at  Figure  3,  we  see 
that  there  are  essentially  two  processes  in  the  client  ma¬ 
chine.  One  of  these  processes  is  responsible  for  pulling  the 
latest  news  stories  down  to  the  user’s  machine,  and  the 
other  is  responsible  for  displaying  these  stories  on  the  user’s 
screen.  The  push  really  occurs  between  these  two  compo¬ 
nents.  When  the  pull-based  story  acquisition  module  gets 
a  new  story,  it  pushes  it  to  the  screen  manager.  From  the 

4  Hereafter,  referred  to  as  PointCast. 


user's  point  of  view,  this  is  push  because  things  are  happen¬ 
ing  to  the  screen  without  any  intervention.  The  use  of  push 
as  a  technique  for  managing  heavy  network  loads,  however, 
is  not  part  of  the  design. 

4.2  Broadcast  Disks 

Our  own  work  on  broadcast  disks  is  based  on  a  model  of 
data  delivery  that  is  virtually  the  direct  opposite  of  that 
described  above  for  PointCast  (see  Figure  4). 

In  our  model,  tin  application  process  on  the  client  work¬ 
station  behaves  exactly  as  it  would  in  a  traditional  pull- 
based  environment.  It  generates  pull  requests  as  it  needs 
data  and  blocks  until  that  data  is  received. 

The  server,  however,  proactively  sends  data  to  the  client 
community  in  advance  of  any  request  (i.e.,  push).  A  process 
on  the  client  listens  to  the  broadcast  stream  and  picks  up 
data  items  for  which  the  application  might  be  waiting  Thus, 
the  places  where  pushes  and  pulls  happen  have  been  inverted 
over  the  PointCast  case. 

It  should  be  noted  that  in  the  broadcast  disk  case,  the 
push  is  periodic  and  is  scheduled  by  the  server.  In  the  Point - 
cast  case,  the  pull  is  also  periodic,  but  the  interval  is  set  by 
the  user. 

4.3  SIFT 

The  SIFT  [Yan95]  system  was  developed  at  Stanford  Univer¬ 
sity  as  a  way  to  disseminate  documents  to  a  user  commuruty. 
SIFT  combines  data  management  ideas  from  information 
retrieval  with  a  publish/subscribe  model  for  dissemination. 
We  describe  the  way  the  publish/subscribe  model  works  in 
terms  of  our  DBIS  architecture. 

Looking  at  Figure  5,  we  see  three  active  components;  the 
document  source,  the  SIFT  server,  and  a  SIFT  client  (one  of 
potentially  many).  The  connection  between  the  document 
source  and  the  SIFT  server  (on  the  left  side  of  the  figure)  is 
push-based,  unicast,  and  aperiodic.  The  document  source 
could  alternatively  deliver  new  documents  through  a  1-to- 
n  broadcast  medium,  such  as  a  satellite  feed,  if  there  were 
multiple  interested  recipients  (SIFT  servers  or  otherwise). 
A  backchannel  (not  shown  in  the  figure),  is  used  only  to  set 
up  the  initial  connection.  Thereafter,  the  document  source 
forwards  all  new  documents  to  the  SIFT  server.  There  is  no 
filtering  that  happens  on  this  link.  We  could  think  of  the 
profile  held  at  the  document  source  for  the  SIFT  server  as 
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Figure  3:  PointCast  1.0 


Figure  4:  Broadcast  Disks 


being  send  everything. 

The  connection  between  the  SIFT  server  and  a  given 
SIFT  client  (shown  on  the  right  side  of  the  figure)  is  also 
push-based,  unicast,  and  aperiodic.  In  this  case,  though,  the 
client  profile  that  is  held  at  the  SIFT  server  is  customized  for 
each  client.  It  consists  of  a  series  of  keywords  and  weights 
that  describe  documents  of  interest  to  that  client.  The  SIFT 
server  provides  novel  technology  for  indexing  client  profiles. 
Such  an  index  is  used  for  matching  profiles  against  newly  ar¬ 
riving  documents.  This  indexing  technique  allows  the  server 
to  accommodate  a  large  client  population  with  reasonable 
performance.  The  original  SIFT  prototype  disseminated  en¬ 
tire  articles  to  clients.  With  the  existence  of  the  web,  it 
becomes  possible  to  send  short  article  descriptions  plus  the 
corresponding  URLs  to  conserve  bandwidth. 

It  should  be  noted  that  clients  get  exactly  what  their 
profiles  specify  and  nothing  more.  This  is  in  contrast  to  a  1- 
to-n  (broadcast)  style  of  delivery  in  which  all  clients  see  the 
same  information  stream.  It  is  the  server’s  responsibility  to 
optimize  this  stream  to  suit  the  needs  of  the  largest  number 
of  users.  It  is  unlikely  that  such  a  stream  will  be  optimal 
for  any  one  user. 

5  Putting  it  All  Together 

In  the  preceding  discussion,  we  described  a  vision  of  how  dis¬ 
tributed  information  systems  should  be  built  in  the  future. 
Our  framework  focused  on  techniques  for  delivering  data  in 
wide-area  network  settings  in  which  nodes  and  links  reflect 
extreme  variation  in  their  operating  parameters.  By  adjust¬ 


ing  the  delivery  mechanism  to  match  these  characteristics, 
we  believe  that  we  can  achieve  high  performance  and  scala¬ 
bility  without  the  need  to  invest  in  additional  hardware.  In 
this  section,  we  briefly  discuss  our  approach  to  this  problem 
and  outline  some  of  the  open  research  questions. 

5.1  Toolkit  Approach 

We  intend  to  realize  our  solutions  to  the  problems  of  design¬ 
ing  a  DBIS  through  a  toolkit  that  provides  the  proper  com¬ 
ponents  from  which  any  DBIS  could  be  built.  This  toolkit 
can  be  thought  of  as  a  set  of  object  classes  that  support 
concepts  such  as  network  connections  and  local  caches. 

A  key  part  of  the  toolkit  will  be  a  set  of  classes  to  allow 
distributed  nodes  to  negotiate  in  order  to  establish  a  proper 
connection.  This  is  required  at  several  levels.  At  the  highest 
level,  the  nodes  must  agree  on  how  data  is  to  be  transferred. 
A  client  node  that  is  relying  on  data  from  some  server  must 
know  whether  that  server  will  be  using  push  or  accepting 
requests.  There  are  also  handshaking  protocols  that  must 
occur  at  lower  levels.  For  example,  if  a  push-based  broadcast 
connection  is  to  be  established  in  an  Ethernet,  the  nodes 
must  agree  on  which  Ethernet  address  will  be  used  for  that 
broadcast.  The  parties  must  also  agree  on  the  parameters 
that  will  be  used  to  configure  that  broadcast.  For  example, 
if  it  is  a  broadcast  disk,  the  frequency  of  broadcast  of  each 
item  is  of  interest  to  the  clients. 

The  usefulness  of  a  toolkit  will  rely  on  the  precise  defi¬ 
nition  of  the  DBIS  classes.  These  classes  must  be  of  general 
utility.  Also,  as  indicated  in  Section  2.3,  the  definition  of 


68 


^ - A 

■  -JT 

Document  j  Push  SIFT 

Source  j  unicast,  aperiodic }  Server 

Push  \ 

unicast,  aperiodic  / 

SIFT 

Client 

w  /  \ 
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these  classes  must  be  based  on  a  substantial  body  of  exper¬ 
imental  results  that  help  to  delineate  the  sometimes  subtle 
tradeoffs. 

5.2  Dynamic  Reconfiguration 

A  network  can  be  characterized  by  prevailing  loads  on  the 
nodes  and  the  connections.  This  characterization  changes 
rapidly,  and  a  responsive  DBIS  must  be  able  to  adapt  to 
these  changes.  Thus,  our  vision  of  a  fully  functional  DBIS 
includes  facilities  to  support  the  dynamic  reconfiguration  of 
the  data  delivery  mechanisms. 

A  key  element  of  a  reconfiguration  facility  is  a  statis¬ 
tics  gathering  component  that  collects  the  right  performance 
numbers  and  that  can  intelligently  select  among  the  avail¬ 
able  delivery  options.  This  is  not  a  simple  matter.  Our  pre¬ 
vious  experiments  in  the  area  of  broadcast  disks  has  shown 
that  the  design  space  here  is  very  complex  with  many  places 
in  which  intuitions  from  more  traditional  distributed  system 
design  often  produces  poor  results. 

5.3  Some  Design  Issues 

In  addition  to  the  plumbing  issues  that  we  have  discussed 
so  far,  there  are  some  higher-level  issues  that  must  be  ad¬ 
dressed  in  developing  an  integrated  DBIS.  In  the  following, 
we  briefly  outline  some  of  these  issues: 

•  Bandwidth  Allocation  -  For  a  given  link,  policies  are 
needed  for  allocating  bandwidth  among  the  various 
data  delivery  mechanisms. 

•  Push  Scheduling  -  For  the  push-based  approaches,  in¬ 
telligent  scheduling  is  necessary  in  order  to  obtain  the 
maximal  benefit  from  the  available  bandwidth.  Schedul¬ 
ing  must  also  take  into  account  the  likelihood  and 
distribution  of  tiausmission  errors.  Also,  for  periodic 
push,  the  broadcast  should  include  index  and/or  sched¬ 
ule  information  that  describes  the  objects  that  are  to 
op petu  in  the  upcoming  broadcast.  Such  information 


allows  clients  to  minimize  the  amount  of  time  and/or 
processing  they  devote  to  monitoring  the  broadcast 
and  can  aid  in  storage  management  decisions. 

•  Client  Storage  Management  -  Clients  must  allocate 
their  storage  resources  among  the  data  obtained  through 
the  various  delivery  mechanisms.  Furthermore,  as  stated 
earlier,  different  methods  of  data  delivery  impose  dif¬ 
fering  demands  on  the  policies  for  client  caching  and 
prefetching.  Furthermore,  in  some  cases  (e.g.,  mobil¬ 
ity),  storage  management  must  also  take  into  account 
the  likelihood  of  disconnection  and  of  data  becoming 
stale  due  to  updates  or  expiration. 

•  User  Profiles  and  Feedback  -  Profiles  of  client  needs  are 
key  for  making  allocation,  scheduling  and  other  policy 
decisions  at  both  clients  and  servers.  The  form  of  the 
profiles  will  be  important  to  achieve  the  most  effective 
use  of  the  medium.  For  example,  access  probabilities 
are  one  specific  representation  of  the  client  needs.  The 
server  must  also  have  effective  models  for  combining 
client  profiles.  The  integration  of  a  backchannel  from 
clients  to  servers  is  needed  to  allow  for  updating  pro¬ 
files  and  making  additional  requests. 

•  Security  Issues  -  Another  set  of  important  issues  that 
must  be  addressed  revolves  around  the  security  and 
privacy  concerns  that  arise  in  any  distributed  infor¬ 
mation  system.  The  emphasis  on  one-to-N  communi¬ 
cation  in  a  DBIS,  however,  increases  the  significance 
of  such  issues. 

•  Consistency  Issues  -  The  final  issue  we  list  here  is  the 
maintenance  of  data  consistency,  particularly  in  the 
face  of  possibly  intermittent  connection.  Two  types  of 
consistency  must  be  considered.  First,  guarantees  on 
the  timeliness  of  individual  data  items  must  be  pro¬ 
vided  if  required  by  the  clients.  Second,  mutual  con¬ 
sistency  across  multiple  items  will  be  required  in  some 
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instances.  All  types  of  consistency  must  be  provided 
in  a  flexible  manner,  so  that  tradeoffs  between  consis¬ 
tency  and  responsiveness  can  be  made  on  a  case-by- 
case  basis. 

6  Related  Work 

Work  on  distributed  object  computing  has  generated  many 
important  standards  and  systems.  CORBA  [OMG91]  and 
DCE  [0SF9-J],  for  example,  are  two  important  approaches 
to  system  interoperability.  This  work  is  not  incompatible 
with  the  notion  of  a  DB1S.  A  DB1S  can  be  thought  of  as 
infrastructure  for  such  object-oriented  middleware. 

There  is  much  previous  work  that  relates  to  the  archi¬ 
tectural  issues  of  a  DBIS.  The  brief  discussion  that  follows 
samples  some  of  the  work  that  is  most  related  to  the  issues 
presented  in  this  paper. 

The  management  of  data  in  distributed  settings  has  a 
long  history.  The  preponderance  of  previous  work  assumes 
that  data  is  requested  when  needed  (i.e..  pull)  and  that 
servers  respond  to  these  requests  in  an  orderly  fashion.  Some 
of  this  work  has  occurred  in  a  client/server  database  setting 
[Fran96a]  while  other  work  has  been  done  in  the  distributed 
file  system  context  [Levy90].  There  has  been  a  lot  of  work 
on  caching  in  these  environments,  much  of  which  has  fo¬ 
cused  on  the  maintenance  or  cache  consistency  in  the  face 
of  updates. 

More  recently,  there  has  been  work  on  data  management 
issues  for  wireless  environments  [Katz94].  Some  of  work 
in  this  area  has  focused  on  satellite-based  systems  [Dao96, 
Dire96]  in  which  the  downstream  bandwidth  is  quite  high. 

The  idea  of  the  publish/subscribe  model  as  a  dissemina¬ 
tion  mechanism  has  been  used  in  many  contexts  including 
SIFT  [Yan95]  and  the  Information  Bus[Oki93]. 

There  has  also  been  work  on  broadcasting  in  Teletex  sys¬ 
tems  [Amma85,  WongSS].  [Wong88]  presents  an  overview 
of  some  of  the  analytical  studies  on  one-way,  two-way  and 
hybrid  broadcast  in  this  framework. 

The  Datacycle  Project  [Bowe92,  Herm87]  at  Bellcore  in¬ 
vestigated  the  notion  of  using  a  repetitive  broadcast  medium 
for  database  storage  and  query  processing.  An  early  ef¬ 
fort  in  information  broadcasting,  the  Boston  Community 
Information  System  (BCIS)  is  described  in  [Giff90].  BCIS 
broadcast  news  articles  and  information  over  an  FM  chan¬ 
nel  to  clients  with  personal  computers  specially  equipped 


with  radio  receivers.  Both  Datacycle  and  BCIS  used  a  flat 
broadcast  (i.e.,  all  items  have  the  same  frequency).  The  mo¬ 
bility  group  at  Rutgers  [Imie94a,  Imie94b]  has  done  signifi¬ 
cant  work  on  data  broadcasting  in  mobile  environments.  A 
main  focus  of  their  work  has  been  to  investigate  novel  ways 
of  indexing  in  order  to  reduce  power  consumption  at  the 
mobile  clients.  Some  recent  applications  of  dissemmation- 
based  systems  include  information  dissemination  on  the  In¬ 
ternet  [Yan95,  Best 96],  and  Advanced  Traveler  Information 
Systems  [Shek96]. 

Our  work  on  Broadcast  Disks  differs  from  these  in  that 
we  consider  multi-level  disks  and  their  relationship  to  cache 
management.  In  [Acha95a],  we  proposed  an  algorithm  to 
generate  Broadcast  Disk  programs  and  demonstrated  the 
need  for  cost-based  caching  in  this  environment.  Recently, 
[Baru96]  gave  an  algorithm  to  determine  the  parameters 
controlling  a  broadcast  program.  In  [Acha96a],  we  showed 
how  opportunistic  prefetching  by  the  client  can  significantly 
improve  performance  over  demand-driven  caching.  More  re¬ 
cently,  in  [Acha96b],  we  studied  the  influence  of  volatile  data 
on  client  performance  and  showed  that  the  Broadcast  Disk 
environment  can  be  made  very  robust  in  the  presence  of  up¬ 
dates.  In  [Acha97],  we  explored  the  tradeoff  between  cyclic 
broadcast  and  pull. 

7  Conclusions 

The  increasing  ability  to  interconnect  computers  through 
internetworking,  mobile  and  wireless  networks,  and  high- 
bandwidth  content  delivery  to  the  home,  has  resulted  m 
a  proliferation  of  dissemination-oriented  applications.  A 
key  attribute  of  many  such  applications  is  their  huge  scale. 
These  applications  present  new  challenges  for  data  manage¬ 
ment  throughout  all  components  of  a  distributed  informa¬ 
tion  system.  We  have  proposed  the  notion  of  a  dissemination- 
based  information  system  that  integrates  many  different  data 
delivery  mechanisms  and  types  of  information  brokers.  We 
described  some  of  the  unique  aspects  of  such  systems  and 
discussed  how  several  existing  dissemination-based  architec- 
tures  fit  in  to  the  DBIS  model. 

The  ideas  presented  in  this  paper  have  grown  out  of  our 
previous  work  on  the  Broadcast  Disks  paradigm  for  data 
delivery.  A  key  lesson  from  that  work  was  the  importance 
of  applying  a  data  management  perspective  to  distributed 
systems  architecture  issues.  We  are  currently  completing 
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a  prototype  that  combines  the  push-based  Broadcast  Disks 
with  a  pull-based  broadcast  model.  We  view  that  proto¬ 
type  as  the  first  step  in  the  development  of  a  generic  DBIS 
toolkit  that  will  support  the  creation  of  a  variety  of  large- 
scale  dissemination-based  applications  across  several  differ¬ 
ent  communication  media. 
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Abstract 

Push  technology  has  recently  generated  a  tremendous 
amount  of  media  attention,  commercial  activity,  and  contro¬ 
versy.  The  wide  range  of  opinions  on  push  is  understand¬ 
able  given  that  it  represents  a  major  departure  from  the  way 
distributed  information  systems  have  traditionally  been  built. 
Adding  to  the  noise,  however,  is  confusion  about  the  basic 
principles  of  push  and  where  it  fits  in  to  the  world  of  data  de¬ 
livery.  For  example,  many  discussions  on  the  topic  blur  the 
distinction  between  push  and  broadcast.  We  argue  that  this 
confusion  stems  from  two  fundamental  causes:  First,  push 
is  just  one  dimension  of  a  larger  design  space  of  data  deliv¬ 
ery  mechanisms.  Second,  networked  information  systems  can 
employ  different  data  delivery  options  between  different  sets 
of  information  producers  and  consumers.  In  this  short  paper 
we  characterize  the  design  space  for  dissemination-based  in¬ 
formation  systems  and  applications,  and  show  how  current 
"push  "  solutions  fit  into  this  space.  We  then  use  this  frame¬ 
work  highlight  how  the  implementation  of  current  Internet- 
based  push  solutions  differs  from  the  appearance  that  they 
present  to  users. 

1  Introduction 

Push  technology  stems  from  a  very  simple  idea.  Rather  than 
requiring  users  to  explicitly  request  (i.e.,  “pull”)  the  informa¬ 
tion  that  they  need,  data  can  be  sent  to  users  without  hav¬ 
ing  them  specifically  ask  for  it.  The  advantages  of  push  are 
straightforward.  The  traditional  pull  approach  requires  that 
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users  know  a  priori  where  and  when  to  look  for  data  or  that 
they  spend  an  inordinate  amount  of  time  polling  known  sites 
for  updates  and/or  hunting  on  the  network  for  relevant  sites. 
Push  relieves  the  user  of  these  burdens.  The  problems  of 
push  are  also  fairly  obvious.  Push  transfers  control  from  the 
users  to  the  data  providers,  raising  the  potential  that  users  re¬ 
ceive  irrelevant  data  while  not  receiving  the  information  they 
need.  These  potential  problems  can  arise  due  to  issues  rang¬ 
ing  from  poor  prediction  of  user  interests  to  outright  abuse 
of  the  mechanism,  such  as  “spamming”.  The  “in-your-face” 
nature  of  push  technology  is  the  root  of  both  its  potential  ben¬ 
efits  and  disadvantages. 

Push  technology  has  been  around  in  various  forms  for  as 
long  as  people  have  been  communicating.  Examples  range 
from  newspapers,  to  telephones,  to  radio  and  television,  to 
E-mail.  Early  work  on  using  computer  networks  for  pushing 
data  was  performed  in  the  1980’s.  The  Boston  Community 
Information  System  at  MIT  [Giff90],  Teletext  systems  for 
distributing  data  over  broadcast  media  [Amma85,  Wong88], 
and  the  Datacycle  database  machine  [Herm87],  are  all  exam¬ 
ples  of  systems  that  incorporated  some  form  of  push  technol¬ 
ogy.  Recently,  however,  the  combination  of  push  technology 
with  the  Internet  and  Web  (sometimes  referred  to  as  Webcast¬ 
ing)  has  generated  a  ground  swell  of  excitement,  commercial 
activity,  and  controversy. 

1.1  The  Push  Phenomenon 

In  February  1996,  PointCast  made  its  client  software  avail¬ 
able  for  free  downloading  over  the  Internet,  setting  off  a 
wave  of  interest  in  push  technology.  The  idea  was  appeal¬ 
ing:  rather  than  using  your  idle  desktop  machine  as  a  dis¬ 
play  ground  for  flying  toasters,  PointCast  would  turn  it  into 
an  active  information  terminal  that  would  display  headlines, 
weather  forecasts,  stock  prices,  sports  scores,  etc.,  with  the 
appearance  of  having  real-time  updates.  By  specifying  a  pro¬ 
file,  users  could  indicate  their  interests  to  the  system,  and  the 
display  would  be  tailored  to  these  interests. 

For  anyone  who  tried  the  software,  the  reaction  was  im¬ 
mediate;  this  represented  a  paradigm  shift  in  the  way  one 
could  think  about  using  the  Internet  as  an  information  deliv¬ 
ery  tool.  Push  technology  on  the  Internet  represented  a  new 
and  untapped  medium.  The  computer  trade  press  became  in- 


undated  with  articles  about  push  technology  and  dozens  of 
companies  touting  push-based  solutions  arrived  on  the  scene. 
A  new  jargon  of  data  delivery  was  developed,  with  terminol¬ 
ogy  borrowed  from  broadcast  media.  Users  of  push  technol¬ 
ogy  could  rune  into  channels  that  contained  broadcasts  of  in¬ 
formation  on  particular  topics. 

By  the  end  of  1996.  the  excitement  had  spilled  over  into 
the  mainstream  press.  A  steady  stream  of  articles  about  push 
technology  appeared  in  venues  such  as  the  New  York  Times 
and  the  Wall  Street  Journal .’  In  February  1997,  Business 
Week  magazine  published  a  Special  Report  section  entitled 
“A  Way  Out  of  the  Web  Maze”,  which  argued  that  Webcast¬ 
ing  could  solve  many  of  the  Web's  problems,  such  as  infor¬ 
mation  overload  and  the  inability  for  users  to  find  the  data 
they  need.  Similar  sentiments  were  echoed  by  numerous  ven¬ 
dors  and  technology  pundits. 

The  peak  of  the  media  hype  for  push  technology  was 
reached  in  March  of  1997  when  the  cover  article  of  Wired 
magazine  blared:  “Push!  Kiss  your  browser  goodbye”.  This 
article  began  by  declaring:  “Remember  the  browser  war  be¬ 
tween  Netscape  and  Microsoft?  Well  forget  it.  The  Web 
browser  itself  is  about  to  croak.  And  good  riddance.”.  While 
the  article  was  certainly  provocative  and  clearly  overstated, 
the  argument  it  made  was  simply  that  push  technology  would 
change  the  Web  from  a  passive  library  of  information  into 
a  networked,  immersive  medium  for  information  and  enter¬ 
tainment  delivery.  Despite  this  simple  message,  the  article 
seemed  to  epitomize  both  the  promise  of  push  technology  and 
the  potential  for  overselling  its  virtues. 

1.2  The  Inevitable  Backlash 

Around  the  time  of  the  Wired  article,  the  voices  of  dissent 
began  to  make  themselves  heard.  A  March  1997  New  York 
Times  CyberTimes  article  by  James  Gleick  stated:  "...  the 
promotion  of  Push  is  the  silliest  piece  of  puffery  to  waft 
along  in  several  seasons.  ...  The  failure  of  Push  is  preor¬ 
dained”.  A  July  1997  article  in  the  on-line  ncl-zinc  webmon¬ 
key  (published  by  the  same  company  that  publishes  Wired), 
was  entitled  simply  “Why  Channels  Suck”.  A  somewhat 
more  technical  article  at  the  CNET  on-line  site  entitled  "Net¬ 
works  Strained  By  Push”,  described  a  study  indicating  that 
push  technologies  were  using  an  inordinate  portion  of  corpo¬ 
rate  network  bandwidth.  Finally,  a  Byte  magazine  article  in 
August  1997  had  the  tag  line:  “Web  push  technology  is  ex¬ 
ploding  —  even  though  there’s  no  such  thing.”.  The  Byte  ar¬ 
ticle  went  on  to  explain  (correctly)  that  current  push  technol¬ 
ogy  is  “really  pull++”. 

1  Many  of  these  articles  had  titles  such  as  “When  Push  Comes  to 
Shove”,  “The  Pull  of  Push",  or  “X  Gets  Pushy”  (where  X  is  some 
product  or  company).  The  observant  reader  will  notice  that  we  have 
resisted  such  temptations  for  this  paper. 


1.3  The  Current  Situation 

Recently,  the  media  turmoil  over  push  has  settled  down  and 
expectations  for  the  technology  (at  least  for  the  short  term) 
have  lowered  to  arguably  more  reasonable  levels.  Still,  the 
commercial  activity  in  the  area  is  impressive.  As  of  Jan¬ 
uary  1998,  a  register  of  push  technology  vendors  listed  49 
companies  with  announced  products  (see  David  Strom’s  site 
at  http://www.strom.com/imc/t4a.html).  Many  other  com¬ 
panies  who  have  not  yet  announced  products  are  working 
on  push-based  solutions.  The  major  web  browser  vendors, 
Netscape  and  Microsoft,  have  both  incorporated  push  into 
their  products. 

A  development  indicating  a  degree  of  maturation  of  the 
field  is  Microsoft’s  proposal  of  the  Channel  Definition  For¬ 
mat  (CDF)  standard  to  the  World  Wide  Web  Consortium 
(W3C).  CDF  is  a  language  that  web  publishers  can  use  to  turn 
their  content  into  “Channels”  that  can  be  exploited  by  push 
(or  “pull++”)  technologies.  CDF  allows  the  specification  of 
metadata  about  a  website,  including  a  searchable  title  and  ab¬ 
stract  and  information  about  the  structure  and  update  sched¬ 
ule  of  the  site.  A  number  of  the  major  push  vendors  such  as 
PointCast,  BackWeb,  and  AirMedia  have  expressed  support 
for  the  proposed  standard.  Such  a  standard  raises  the  poten¬ 
tial  for  push  technology  to  be  more  widely  integrated  into  the 
fabric  of  the  Internet. 

1.4  Sorting  it  All  Out 

The  wide  range  of  opinions  on  the  pros  and  cons  of  push  tech¬ 
nology  is  understandable,  given  the  fact  that  it  is  a  major  de¬ 
parture  from  the  way  distributed  information  systems  have 
traditionally  been  built.  Adding  to  the  noise,  however,  is  a 
wide-spread  confusion  about  the  basic  principles  of  push  and 
where  it  fits  in  to  the  world  of  data  delivery.  In  this  short  pa¬ 
per  we  argue  that  this  confusion  stems  from  two  fundamen¬ 
tal  causes:  First,  push  is  just  one  dimension  of  a  larger  de¬ 
sign  space  of  data  delivery  mechanisms.  We  identify  three 
dimensions  for  data  delivery  mechanisms  (push  vs.  pull  is 
one  of  them)  and  show  how  different  choices  along  these  di¬ 
mensions  interact.  Second,  networked  information  systems 
can  employ  different  data  delivery  options  between  different 
sets  of  information  producers  and  consumers.  Thus,  complex 
systems  will  likely  contain  mixtures  of  push  and  pull  (along 
with  the  other  options)  at  various  points  in  the  network.  In 
such  a  situation,  it  is  inappropriate  to  identify  an  entire  sys¬ 
tem  as  being  “push-based”  or  “pull-based”. 

In  the  following,  we  present  an  overview  of  our  ideas  on 
data  dissemination  in  order  to  provide  a  framework  for  think¬ 
ing  about  push  technology  in  the  larger  context  of  networked 
information  systems.  Our  intent  is  to  clarify  some  of  the  is¬ 
sues  surrounding  push  technology  and  to  characterize  the  de¬ 
sign  space  for  data  delivery  in  dissemination-based  informa¬ 
tion  systems  and  applications. 
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2  Fundamental  Properties 

In  this  section,  we  present  an  overview  of  data  delivery,  fo¬ 
cusing  on  how-  the  notion  of  data  push  fits  in  with  the  other  di¬ 
mensions  of  the  design  space  for  delivery  mechanisms.  We 
then  describe  why  it  is  often  inappropriate  to  refer  to  com¬ 
plex  distributed  systems  as  simply  “push-based  or  “pull- 
based”.  A  more  detailed  discussion  of  these  issues  can  be 
found  in  [Fran97]. 

2.1  Options  for  Data  Delivery 

Support  for  different  styles  of  data  delivery  allows  a  dis¬ 
tributed  information  system  to  be  optimized  for  various 
server,  client,  network,  data,  and  application  properties.  We 
have  identified  three  main  characteristics  that  can  be  used  to 
compare  data  delivery  mechanisms:  (1)  push  vs.  pull;  (2)  pe¬ 
riodic  vs.  aperiodic;  and  (3)  unicast  vs.  1-to-N.  While  there 
are  numerous  other  dimensions  that  should  be  considered, 
such  as  fault-tolerance,  ordering  guarantees,  error  properties, 
network  topology,  etc.,  we  have  found  that  these  three  char¬ 
acteristics  provide  a  good  initial  basis  for  discussing  many 
popular  approaches.  In  particular,  we  argue  that  all  three  of 
these  characteristics  must  be  considered  in  order  to  make  in¬ 
telligent  choices  about  delivery  mechanisms  for  specific  situ¬ 
ations.  Figure  1  shows  these  characteristics  and  how  several 
common  mechanisms  relate  to  them. 

2.1.1  Client  Pull  vs.  Server  Push 

We  first  focus  on  push  vs.  pull.  Current  database  servers  and 
object  repositories  manage  data  for  clients  that  explicitly  re¬ 
quest  data  when  they  require  it.  When  a  request  is  received 
at  a  server,  the  server  locates  the  information  of  interest  and 
returns  it  to  the  client.  This  request -response  style  of  opera¬ 
tion  is  pull-based  —  the  transfer  of  information  from  servers 
to  clients  is  initiated  by  a  client  pull.  In  contrast,  as  discussed 
in  the  introduction,  push-based  data  delivery  involves  send¬ 
ing  information  to  a  client  population  in  advance  of  any  spe¬ 
cific  request.  With  push-based  delivery,  the  server  initiates 
the  transfer. 

2.1.2  Aperiodic  vs.  Periodic 

Both  push  and  pull  can  be  performed  in  either  an  aperiodic  or 
periodic  fashion.  Aperiodic  delivery  is  event-driven  —  a  data 
request  (for  pull)  or  transmission  (for  push)  is  triggered  by  an 
event  such  as  a  user  action  (for  pull)  or  data  update  (for  push). 
In  contrast,  periodic  delivery  is  performed  according  to  some 
pre-arranged  schedule.  This  schedule  may  be  fixed,  or  may 
be  generated  with  some  degree  of  randomness.2  An  applica¬ 
tor  the  purposes  of  this  discussion,  we  do  not  distinguish  be¬ 
tween  fixed  and  randomized  schedules.  Such  a  distinction  is  im¬ 
portant  in  certain  applications.  For  example,  algorithms  for  con¬ 
serving  energy  in  mobile  environments  proposed  by  ImielinsKi  et 


tion  that  sends  out  stock  prices  on  a  regular  basis  is  an  exam¬ 
ple  of  periodic  push,  whereas  one  that  sends  out  stock  prices 
only  when  they  change  is  an  example  of  aperiodic  push. 

2.1.3  Unicast  vs.  1 -to-N 

The  third  characteristic  of  data  delivery  mechanisms  is 
whether  they  are  based  on  unicast  or  1-to-N  communication. 
With  unicast  communication,  data  items  are  sent  from  a  data 
source  (e.g.,  a  single  server)  to  one  other  machine,  while  1- 
to-N  communication  allows  multiple  machines  to  receive  the 
data  sent  by  a  data  source.3 

Two  types  of  1-to-N  data  delivery  can  be  distinguished: 
multicast  and  broadcast.  With  multicast,  data  is  sent  to  a  spe¬ 
cific  subset  of  clients  who  have  indicated  their  interest  in  re¬ 
ceiving  the  data.  Since  the  recipients  are  known,  given  a  two- 
way  communications  medium  it  is  possible  to  make  multi¬ 
cast  reliable;  that  is,  network  protocols  can  be  developed  that 
guarantee  the  eventual  delivery  of  the  message  to  all  clients 
that  should  receive  it.  In  contrast,  broadcasting  sends  infor¬ 
mation  over  a  medium  on  which  an  unidentified  and  possibly 
unbounded  set  of  clients  can  listen. 

2.2  Classification  of  Delivery  Mechanisms 

It  is  possible  to  classify  many  existing  data  delivery  mecha¬ 
nisms  using  the  characteristics  described  above.  Such  a  clas¬ 
sification  is  shown  in  Figure  1.  We  discuss  several  of  the 
mechanisms  below. 

Aperiodic  Pull  -  Traditional  request/response  mecha¬ 
nisms  use  aperiodic  pull  over  a  unicast  connection.  If  in¬ 
stead,  a  1-to-N  connection  is  used,  then  clients  can  “snoop’ 
on  the  requests  made  by  other  clients,  and  obtain  data  that 
they  haven’t  explicitly  asked  for  (e.g,  see  [  Acha97 ,  Akso98]). 

Periodic  Pull  -  In  some  applications,  such  as  remote  sens¬ 
ing,  a  system  may  periodically  send  requests  to  other  sites  to 
obtain  status  information  or  to  detect  changed  values.  If  the 
information  is  returned  over  a  1-to-N  link,  then  as  with  re¬ 
quest/response,  other  clients  can  snoop  to  obtain  data  items  as 
they  go  by.  Most  existing  Web  or  Internet-based  “push”  sys¬ 
tems  are  actually  implemented  using  Periodic  Pull  between 
the  client  machines  and  the  data  source(s). 

Aperiodic  Push  -  Publish/subscribe  protocols  are  be¬ 
coming  a  popular  way  to  disseminate  information  in  a  net¬ 
work  [Oki93,  Yan95,  Glan96].  In  a  publish/subscribe  sys¬ 
tem,  users  provide  information  (sometimes  in  the  form  of  a 

al.  [Imie94]  depend  on  a  strict  schedule  to  allow  mobile  clients 
to  “doze”  during  periods  when  no  data  of  interest  to  them  will  be 
broadcast 

3  Some  systems  attempt  to  implement  a  1-to-N  style  of  data  de¬ 
livery  using  unicast  (i.e.,  by  sending  identical,  individual  messages 
to  multiple  clients).  As  discussed  in  Section  3,  this  type  of  pseudo¬ 
broadcast  can  result  in  tremendous  bandwidth  and  server  overload 
problems.  For  this  reason,  we  classify  such  systems  as  “unicast- 
based”  in  our  taxonomy. 
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Figure  1:  Data  Delivery  Options 


profile)  indicating  the  types  of  information  they  wish  to  re¬ 
ceive.  Publish/subscribe  is  push-based;  data  flow  is  initiated 
by  the  data  sources,  and  is  aperiodic,  as  there  is  no  prede¬ 
fined  schedule  for  sending  data.  Publish/subscribe  protocols 
are  inherently  1-to-N  in  nature,  but  due  to  limitations  in  cur¬ 
rent  Internet  technology,  they  are  often  implemented  using 
individual  unicast  messages  to  multiple  clients.  Examples  of 
such  systems  include  Internet  e-mail  lists  and  some  existing 
“push”  systems  on  the  Internet.  True  1-to-N  delivery  is  pos¬ 
sible  through  technologies  such  as  IP-Multicast,  but  such  so¬ 
lutions  are  typically  limited  to  individual  Intranets  or  Local 
Area  Networks. 

Periodic  Push  -  Periodic  push  has  been  used  for  data  dis¬ 
semination  in  many  systems.  An  example  of  Periodic  Push 
using  unicast  is  Internet  mailing  lists  that  send  out  “digests 
on  a  regular  schedule.  For  example,  the  Majordomo  system 
allows  a  list  manager  to  set  up  a  schedule  (e.g.,  weekly)  for 
sending  digests.  Such  digests  allow  users  to  follow  a  mailing 
list  without  being  continually  interrupted  by  individual  mes¬ 
sages.  There  have  also  been  many  systems  that  use  Periodic 
Push  over  a  broadcast  or  multicast  link.  These  include  Tele- 
Text  [Amma85,  Wong88],  DataCycle  [Herm87],  Broadcast 
Disks  [Acha95a,  Acha95b]  and  mobile  databases  [Imie94]. 

2.3  End-to-End  Considerations 

The  second  source  of  confusion  about  push  technology  is 
the  fact  that  networked  information  systems  typically  contain 
many  interconnected  nodes.  These  nodes  may  be  (logically) 
organized  in  various  structures,  and  different  data  delivery 
mechanisms  may  be  used  between  different  sets  of  nodes. 
Given  the  potential  heterogeneity  of  delivery  mechanisms  in 
a  complex  system,  it  is  often  not  appropriate  to  describe  the 
entire  end-to-end  (i.e.,  data  source  to  consumer)  system  as 
“push-based"  or  “pull-based”. 

In  general,  a  distributed  information  system  can  be  though 
of  as  having  three  types  of  nodes:  (1)  data  sources,  which 
provide  the  base  data  that  is  to  be  disseminated;  (2)  clients, 
which  are  net  consumers  of  information;  and  (3)  information 
brokers,  (or  agents,  mediators,  etc.)  that  acquire  information 
from  other  sources,  add  value  to  that  information  (e.g.,  some 


additional  computation  or  organizational  structure)  and  then 
distribute  this  information  to  other  consumers.  By  creating 
hierarchies  of  brokers,  information  delivery  can  be  tailored 
to  the  needs  of  many  different  users. 

While  the  previous  discussion  has  focused  primarily  on 
different  modes  of  data  delivery,  the  brokers  provide  the  glue 
that  binds  these  modes  together.  In  many  cases,  the  expected 
usage  patterns  of  the  brokers  can  drive  the  selection  of  which 
mode  of  delivery  to  use.  For  example,  a  broker  that  typically 
is  very  heavily  loaded  with  requests  could  be  an  excellent 
candidate  for  a  push-based  delivery  mechanism  to  its  clients. 

As  we  move  upstream  in  the  data  delivery  chain,  brokers 
look  like  data  sources  to  their  clients.  Receivers  of  infor¬ 
mation  cannot  detect  the  details  of  interconnections  any  fur¬ 
ther  upstream  than  their  immediate  predecessor.  This  prin¬ 
ciple  of  network  transparency  allows  data  delivery  mecha¬ 
nisms  to  change  without  having  global  impact.  Suppose  that 
node  B  is  pulling  data  values  from  node  A  on  demand.  Fur¬ 
ther,  suppose  that  node  C  is  listening  to  a  periodic  broadcast 
from  node  B  which  includes  values  that  B  has  pulled  from  A. 
Node  C  will  not  have  to  change  it’s  data  gathering  strategy  if 
A  begins  to  push  values  to  B.  Changes  in  links  are  of  inter¬ 
est  only  to  the  nodes  that  are  directly  involved.  Likewise,  this 
transparency  allows  the  “appearance”  of  the  data  delivery  at 
any  node  to  differ  from  the  way  the  data  is  actually  delivered 
earlier  in  the  network.  This  ability  to  change  the  appearance 
of  data  delivery,  is  at  the  root  of  much  of  the  confusion  sur¬ 
rounding  push  technology. 

Figure  2  shows  a  simple  example  of  the  importance  of 
considering  multiple  network  components  and  the  impact  of 
transparency.  The  figure  shows  how  data  delivery  is  per¬ 
formed  in  the  initial  versions  of  PointCast.  To  the  user  sitting 
at  the  screen,  the  system  appears  to  be  “push-based”;  data 
flows  across  the  screen  without  any  user  intervention.  Due  to 
current  limitations  of  the  Internet,  however,  that  data  is  ac¬ 
tually  brought  over  to  the  client  machine  using  a  stream  of 
periodic  pull  requests,  delivered  in  a  unicast  fashion.  Thus, 
the  implementation  of  PointCast  1.0  between  the  client  and 
the  PointCast  server  is  actually  the  exact  opposite  of  the  view 
that  is  presented  to  the  user  in  all  three  dimensions  of  the  hier¬ 
archy  of  Figure  1.  This  situation  is  not  unique  to  PointCast; 
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Figure  2:  PointCast  1.0 


in  fact,  it  is  true  for  virtually  all  of  the  Internet-based  push 
solutions,  and  stems  from  the  fact  that  current  IP  and  HTTP 
protocols  do  not  adequately  support  push  or  1-to-N  commu¬ 
nication. 

3  Reexamining  Current  Push  Technology 

The  previous  section  identified  several  of  the  sources  of  con¬ 
fusion  in  the  current  discussions  and  debate  regarding  push 
technology.  In  particular,  the  confusion  stems  from  the  mis¬ 
match  between  the  user's  perception  and  the  actual  data  de¬ 
livery  mechanisms  used  by  the  system.  Furthermore,  this 
mismatch  is  also  at  the  root  of  many  of  the  performance  con¬ 
cerns  (particularly  bandwidth  overload)  associated  with  cur¬ 
rent  push  technology.  The  impact  of  the  mismatch  on  perfor¬ 
mance  can  be  summarized  as  follows: 

Pull  instead  of  push  -  Current  webcasting  solutions  typ¬ 
ically  use  data  pull  to  obtain  information  from  data  sources. 
This  choice  is  due  to  limitations  of  the  HTTP  protocol,  which 
is  primarily  pull-based.  As  stated  previously,  replacing  push 
with  pull  requires  that  the  pull  be  done  in  a  polling  man¬ 
ner.  Polling  can  be  quite  resource  intensive  because  it  gen¬ 
erates  many  requests.  These  requests  consume  client,  server, 
and  network  resources.  The  problems  are  exacerbated  if  all 
clients  poll  individually,  which  could  result  in  servers  becom¬ 
ing  overloaded  due  to  the  high  volume  of  requests. 

Periodic  instead  of  aperiodic-  Polling  is  typically  done  in 
a  periodic  manner  that  is  independent  of  the  events  (e.g.,  data 
modifications)  that  would  require  data  to  be  transfered.  This 
independence  results  in  a  granularity  problem:  if  polling  is 
done  too  frequently,  then  the  overhead  can  become  substan¬ 
tial;  if  it  is  done  too  infrequently,  then  clients  may  unknow¬ 
ingly  be  accessing  stale  data. 

Unicast  instead  of  1-to-N  -  In  the  absence  of  a  true  broad¬ 
cast  or  multicast  facility,  systems  that  require  1-to-N  behavior 
must  implement  it  using  multiple  identical  messages,  one  for 
each  intended  recipient.  The  potential  bandwidth  problems 
of  such  an  approach  are  obvious.  If  n  clients  are  interested  in 
the  same  data  item,  then  that  same  item  must  be  sent  over  the 
network  n  times. 

Fortunately,  the  concept  of  Network  Transparency  can  be 
used  to  ameliorate  this  situation.  One  solution  involves  plac¬ 
ing  a  local  server  inside  an  organization’s  firewall.  All  the 
clients  interact  with  the  local  server  in  the  way  that  is  most 
appropriate  for  the  local  network  and  system  configuration. 


The  local  server  can  then  perform  polling  of  the  remote  data 
source  on  behalf  of  the  entire  organization,  which  reduces  In¬ 
ternet  traffic.  Likewise,  the  data  source  needs  only  to  send  a 
single  copy  of  each  data  item  to  the  local  server,  which  can 
then  distribute  it  to  all  the  clients  it  represents.  The  local 
server  can  then  multicast  the  data  to  its  clients,  if  such  capa¬ 
bility  exists. 


4  Conclusions 

In  summary,  push  is  currently  a  hot  topic,  but  it  is  essential 
that  it  be  placed  in  the  proper  context.  Push  is  one  choice 
(among  many)  for  data  delivery  in  distributed  information 
systems.  Push  is  not,  for  example,  the  same  as  broadcast.  In 
fact,  many  existing  push-based  products  are  based  on  peri¬ 
odic  pull  over  unicast  connections.  In  our  work  on  data  dis¬ 
semination,  we  have  advocated  a  new  look  at  the  construction 
of  distributed  information  systems  that  allows  a  seamless  in¬ 
tegration  of  all  data  delivery  mechanisms  including,  but  not 
limited  to  the  various  forms  of  push.  We  believe  that  this  is 
a  fertile  area  of  work  for  the  database  community  since  the 
use  of  careful  data  management  techniques  in  this  context  can 
have  a  significant  impact  on  overall  system  performance  and 
usability- 
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AFRL/VS05A (LIeSAPY-BLOG 
5  WEIGHT  D  R I V  : 

HAN  3COM  ARE  "A  01731-3Cw4 


laboratory 

1103) 


ATTN:  EILEEN  LADUKE/D46U 

NITRE  CORPORATION 
202  BURLINGTON  SC 
alDfOSD  M A  01730 


gusd(°)/c-sa/dutd 

ATTN:  PATRICK  G.  SULLIVAN/ 

uQj  ARMY  NAVY  drive 


SUITE  330 

ARLINGTON  V»  22297 


JR. 


SOFT. ARE  ENGR'G  INST  TECH  LIBRARY 
ATTN:  MR  DENNIS  SMITH 

CARNEGIE  MELLON  UNIVERSITY 
PITTSBURGH  PA  15H3-3B90 


USC-ISI 

ATTN:  DR  ROBERT  v 
4675  ADMIRALTY  WAY 
MARINA  DEL  RcY  CA 


.  BALZER 
90292-6693 


kestrel  institute 

ATTN:  DR  CORDELL  GREEN 

1S31  ° A S E  M ILL  ROAD 
PALO  ALTO  CA  54304 


ROCHESTER  I N  ST I TUT  E  Of  TECHNOLOGY 
ATTN:  PROF  J.  A.  L  A  SK  Y 

1  L0M3  MEMORIAL  DpIVE 
P.0,  aox  R 8 87 
ROCHESTER  NY  14613-5700 

AFIT/ENS 

ATTN:TON  h  artrum 
W** A F 9  OH  45433-6583 


1 


313 


1 


410 


1 


565 


THE  MTTRE  CORPORATION 
ATTN:  HR  EDWARD  H.  5ENSLEY 

BURLINGTON  RD/MAIL  STOP  A350 

B  t  D  F  0  ft  D  HA  01730 


ANDREW  A.  CHIEN 

SAIC  CHAIR  PROF  (SCI  APL  I  NT  CORP) 

US  CD/ CSE-APSM  430? 

9500  GILMAN  DRIVE/  DEPT.  0114 
LAJOLLA  CA  52093-011** 

HONEYWELL/  INC. 

ATTN:  HR  3ERT  HARRIS 

FEDERAL  SYSTEMS 
7*500  W £ 3 T ® A P. K  DRIVE 
MCLEAN  VA  22102 

SOFTWARE  ENGINEERING  INSTITUTE 
ATTN:  hr.  WILLIAM  E.  HEFLEY 

CARNEGIE- MEL LON  UNIVERSITY 
304  OAK  GROVE  CT 
WESEORD  PA  15090 

UNIVERSITY  OF  SOUTHERN  CALIFORNIA 
ATTN:  DR.  YIGAL  ARENS 

INFORMATION  SCIENCES  INSTITUTE 
4676  ADMIRALTY  WAY/SUITE  1001 
MARINA  DEL  REY  CA  90292-6695 

C0LUH9IA  UNIV/DEPT  COMPUTER  SCIENCE 
ATTN:  DR  GAIL  E.  KAISER 

450  COMPUTE®  SCIENCE  3LDG 
500  WEST  120TH  STREET 
NEW  YORK  NY  10027 


AFIT/ENS 

ATTN:  DR  GARY  3.  LAMONT 

SCHOOL  OF  ENGINEERING 
DEPT  ELECTRICAL  i  C0M®UT  ER  ENGR5 
WPAFS  OH  45433-6583 


568 


573 


574 


576 


577 


578 


532 


DL-3 


1 


534 


7 


NSA/OFC  OF  RESEARCH 

ATTN:  MS  MARY  ANNE  OVERMAN 

9330  SAVAGE  ROAD 

FT  GEORGE  5 .  MEADE  *D  20755-6000 


TEXAS  INSTRUMENTS  INCORPOR AT  ED 
ATTN:  DR  DAVID  L.  »  ELLS 

1  P.O.  OOX  655474*  *S  236 
DALLAS  TX  75265 


KESTREL  DEVELOPMENT  CORPORATION 
ATTN:  DR  RICHARD  JULLIG 

32&0  HILLVIEW  AVENUE 
PALO  ALTO  CA  94304 


DARPA/ITC 

ATTN:  DR  KIRSTIE  BELLMAN 

•  3701  N  FAIRFAX  DRIVE 

ARLINGTON  VA  22203-1714 


NASA/JOHNSON  SPACE  CENTER 
ATTN:  CHRIS  CULSERT 

MAIL  CODE  PT4 
HOUSTON  TX  77053 


1 


774 


1 


1001 


1  1004 


STERLING  I MD  INC. 

KSC  OPERATIONS 
ATTN:  MARK  MAGTNN 

BEECHES  TECHNICAL  C»MPUS/RT  26  N. 
ROME  NY  13440 

> 

SCHLUMBERSER  LABORATORY  FOR 
COMPUTER  SCIENCE 
l  ATTN:  DR.  GUILLEPMO  ARANGO 
6311  NORTH  FM620 
AUSTIN*  TX  78720 
I 

DECISION  SYSTEMS  DEPARTMENT 
ATTN:  PROF  WALT  SCACCHI 

*  SCHOOL  OF  BUSINESS 

UNIVERSITY  OF  SOUTHERN  CALIFORNIA 
LOS  ANGELES*  CA  90039-1421 

NATIONAL  INSTITUTE  OF  STANDARDS 
AND  TECHNOLOGY 
ATTN:  CHRIS  DA3P0WSKI 

ROOM  A26  6*  BLDG  225 
6 A  IT HS BURS  MD  208°9 

EXPERT  SYSTEMS  LABORATORY 
ATTN:  STEVEN  H.  SCHWARTZ 

NY N EX  SCIENCE  S  TECHNOLOGY 
500  WESTCHESTER  AVENUE 
WHITE  PLAINS  NY  20604 


1C58 


1326 


1329 


1333 


1334 


DL-4 


1  335 


NAVAL  TRAINING  SYSTEMS  CENTER 
ATTN:  ROBERT  3  9  E  A UX/ CODE  252 

12330  RESEARCH  pa°KWAY 
ORLANDO  FL  32S26-3224 


D  7)  JOHN  S  ftL  AST  N 
OARPA/TTO 

3701  NORTH  FAIRFAX  DRIVE 
ARLINGTON  V  A  22203-1714 


DR  BARRY  POEM* 

DIR/’  'JSC  CENTER  FOR  SV  ENGINEERING 
COMPUTER  SCIENCE  D£PT 
UN  I V  r>  F  SOUTHERN  CALIFORNIA 
LOS  ANGELES  CA  90O3°-3791 

DR  STEVE  CROSS 
CARNEGIE  MELLON  UNIVERSITY 
SCHOOL  OF  COMPUTER  SCIENCE 
PITTSBURGH  PA  15213-3591 


DR  MARK  MAY3URY 
MITRE  CORPORATION 
ADVANCED  INFO  SYS  TECH;  G041 
3URLINT0N  ROAD*  M/S  <-329 
BEDFORD  MA  01730 

ISX 

ATTN:  WR.  SCOTT  cOUSE 

4353  p A 5 <  TERRACE  DRIVE 
yC3TLAKE  VILL AGE/C  A  91361 


MR  GARY  EDWARDS 
ISX 

433  PARK  TERRACE  DRIVE 
w^stlake  village  ca  91361 


1  1634 


1  1639 


1  1640 


1  1641 


1  1  a4  2 


1  1643 


lee  erman 

CIMFLEK  TEKNOWLEDGE 
1310  EM9ACADER0  ROAD 
P.O.  BOX  10119 
PALO  ALTO  CA  94303 

DR.  DAVE  GUNNING 

darpa/iso 

3701  NORTH  FAIRFAX  DRIVE 
ARLINGTON  V*  22203-1714 


1  1647 


1 


1669 


DR.  MICHAEL  PITTARELLI 
COMPUTER  SCIENCE  DEPART 
3UNY  TNST  OF  TECH  AT  UTICA/ROME 
P.O.  30X  3050 
UTICAr  NY  13304-3050 


DL-5 


1 


1616 


CapoARO  TEC  wN0L0GT  ES  / 
ATTN:  GcPARD  CA»®ARO 
311  TURNER  ST. 

UTICA/  NY  13501 


INC 


USC/ ISI 

ATTN:  303  MCGREGOR 

4676  ADMIRALTY  WAY 
MARINA  DEL  R £Y/  C*  °0292 


SRI  INTERNATIONAL 
ATTN:  EN° I8UE  RUSPINI 

333  RAVENSWOOD  AVE 
MENLG  PARK/  CA  94025 


DARTMOUTH  COLLEGE 
ATTN:  DANIEL  A  ®US 

1  DEPT  OF  COMPUTER  SCIENCE 
11  PORE  FERRY  ROAD 
HANOVER/  NH  03755-3510 

UNIVERSITY  OF  FLOP  IDA 
ATTN:  ERIC  HANSON 

CISE  D£PT  456  C3E 
GAINESVILLE/  FL  32611-6120 


CARNEGIE  MELLON  UNIVERSITY 
ATTN:  TOM  MITCHELL 

>  COMPUTER  SCIENCE  DEPARTMENT 
PITTSBURGH/  PA  15213-3390 


CARNEGIE  MELLON  UNIVERSITY 

ATTN:  mark  craven 

COMPUTER  SCIENCE  DEPARTMENT 
PTTTS3U°oH/  PA  15213-3590 


UNIVERSITY  OF  ROCHESTER 
ATTN:  JAMES  ALLEN 

1  DEPARTMENT  OF  COM»UTER  SCIENCE 
ROCHESTER/  NY  14627 


1 

TEXTWISE/  LLC 
ATTN:  LIT  LIDDY 

1  2-121  CENTER  FOR  SCIENCE  &  TECH 

SYRACUSE/  NY  13244 


1 


WRIGHT  STATE  UNIVERSITY 
ATTN:  DR.  BRUCE  BERRA 

DEPART  OF  COMPUTE®  SCIENCE 
DAYTON/  OHIO  45435-0001 


&  ENGIN 


1 


1 


1 


1817 


1  81  6 


1819 


1820 


1821 


1622 


18  23 


1324 


1325 


DL-6 


1 


1826 


UNIVERSITY  OF  FLORIDA 
ATTN:  SHARIA  C  H  AK  R  AV  A  RT  HY 

COMPUTER  &  INFOR  SCIENCE  D£°ART 
GAINESVILLE/  FL  32o22“6l25 


i  1827 

KEStREL  INSTITUTE 
ATTN:  DAVID  ESDINOSA 

3260  HILLVIEW  AVENUE 
PALO  ALTO/  CA  94304 


USC/ INFORMATION  SCIENCE  INSTITUTE 
ATTN:  DR.  CARL  KESSELMAN 

*  11474  ADMIRALTY  WAY/  SUITE  1001 

MARINA  DEL  REY /  CA  90292 


MASSACHUSETTS  INSTITUTE  OF  TECH 
ATTN:  DR-  MICHAELE  SIEGEL 

*  SLOAN  SCHOOL 

77  MASSACHUSETTS  AVENUE 
CAMBRIDGE/  MA  02139 

USC/INFOP.  MATION  SCIENCE  INSTITUT; 
ATTN:  DR.  WILLIA*  SWARTHOUT 

*  11474  ADMIRALTY  WAY/  SUITE  1001 
MARINA  DEL  REY/  CA  90292 


1  £30 


1831 


STANFORD  UNIVERSITY 
ATTN:  DR-  GIO  WIEDERHCLD 

857  SIERRA  STREET 
STANFORD 

SANTA  CLARA  COUNTY/  CA  94305-4125 

SPAWARSYSCEN  D44209 
ATTN:  LEAH  WONG 

53245  PATTERSON  ROAD 
SAN  DIEGO/  CA  92152*7151 


SPAWAR  SYSTEM  CENTEP  D41  23 
ATTN:  LES  ANDERSON 

53560  HULL  STREET 
SAN  DIEGO  CA  92152 


GEORGE  MASON  UNIVERSITY 
ATTN:  SUSHIL  JAJODIA 

ISSE  DEPT 

FAIRFAX/  VA  22030-4444 


DIRNSA 

ATTN:  MICHAEL  R.  WARE 

OOD/  NS  A/CSS  (R23) 

FT.  GEORGE  G.  MEADE  MD  20755-600^ 


1832 


1833 


1834 


1835 


1836 


DL-7 


1  1837 

DP-  JIM  RICHARDSON 
3660  TECHNOLOGY  DRIVE 
MINNEAPOLIS,  MN  5541E 


LOUISIANA  STATE  UNIVERSITY 
COMPUTER  SCIENCE  DEPT 
ATTN:  DR.  PETER  CHEN 

257  COATES  HALL 
BATON  ROUGE,  LA  70803 

INSTITUTE  OF  TECH  DEPT  OF  COMP  SCI 
ATTN:  DR.  JAIDEEP  SRIVASTAVA 

4-102  EE/CS 
200  UNION  ST  SE 
MINNEAPOLIS,  MN  55455 

GTE/BBN 

ATTN:  MAURICE  H.  MCNEIL 

9655  GRANITE  RIDGE  DRIVE 
SUITE  245 

SAN  DIEGO,  CA  92123 

UNIVERSITY  OF  FLORIDA 
ATTN:  DR.  SHARMA  CHAKRAVA8THY 

£470  CSE  BUILDING 
GAINESVILLE,  FL  32611-6125 


1838 


183  9 


1  840 


1841 


1  1865 

AFRL/TFT 

525  BROOKS  ROAD 

ROME,  NY  13441-4505 


i  1866 

AFRL/IFTM 

525  BROOKS  ROAD 

ROME,  NY  13441-4505 


CENTRIC  ENGINEERING  SYSTEM,  INC. 
624  EAST  EVELYN  AVENUE 
SUNNYVALE,  CA  94086-.648S 


i  1875 

FLUENT  INCORPORATED 

500  DAVIS  STREET,  SUITE  600 

EVANSTON,  IL  60201 


THE  MACNEAL-SCHWENDLER  CORPORATION 
815  COLORADO  BOULEVARD 
LOS  ANGELES,  CA  90041-1777 


DL-8 


i\j  m 


MOLECULAR  simulations*.  Inc. 

9365  SCRANTON  ROAD 

SAN  DIEGO*  CA  92121-3752 


1 


1  877 


NTRIC  ENGINEERING  SYSTEM  INC. 
4  EAST  EVELYN  AVENUE 
SUNNYVALE*  CA  94036-6488 


♦  Total  Number  o*  Codes  is: 


92 


1878 


DL-9 


MISSION 

OF 

AFRL/INFORMA  TION  DIRECTORA  TE  (IF) 


The  advancement  and  application  of  information  systems  science  and 
technology  for  aerospace  command  and  control  and  its  transition  to  air, 
space,  and  ground  systems  to  meet  customer  needs  in  the  areas  of  Global 
Awareness,  Dynamic  Planning  and  Execution,  and  Global  Information 
Exchange  is  the  focus  of  this  AFRL  organization.  The  directorate’s  areas 
of  investigation  include  a  broad  spectrum  of  information  and  fusion, 
communication,  collaborative  environment  and  modeling  and  simulation, 
defensive  information  warfare,  and  intelligent  information  systems 
technologies. 


